Which MMSI are Used by Multiple Vessels?

This is the fourth take at getting a good spoofing list. This is a copy of segment_overlap_2015_v4, except changing all 2015 to 2014

1) Eliminate all segments that have fewer than 100 points and are all terrestrial, and which are not clumped right next to the equator or prime meridian. This will get rid of many valid segments, but it will also eliminate most of the noise. (A beter analysis will later get rid of more noise).

SELECT
  * from(
  SELECT
    mmsi,
    seg_id,
    first_timestamp,
    last_timestamp,
    LAG(last_timestamp,1) OVER (PARTITION BY mmsi ORDER BY first_timestamp) last_previous_segment,
    LEAD(first_timestamp,1) OVER (PARTITION BY mmsi ORDER BY first_timestamp) first_next_segment
  FROM
    [scratch_david_seg_analysis.2014_segments]
  WHERE
    (point_count >=100
      OR (point_count>10
        AND terrestrial_positions != point_count))
    AND (last_timestamp - first_timestamp)/3600000000 > 12 // more than 12 hours
    AND NOT ((min_lon >= 0 // these are almost definitely noise
        AND max_lon <= 0.109225)
      OR (min_lat >= 0
        AND max_lat <= 0.109225)) )
WHERE
  mmsi IN (
  SELECT
    mmsi
  FROM
        [scratch_david_mmsi_lists.Combinedfishing_2014])

Save to table: [scratch_david_seg_analysis.good_fishing_segments_2014]

2) Look at segments that overlap, and ignore the ones that should not be brocken up into different segments -- that is, look at the average distance between consecutive points, and determine if they are realistic segments. We have to eliminate these because it looks like the segmenter was dividing what should be continuous segments into different segments, which in turn made some vessels have overlap which clearly did not have overlap.

SELECT
  seg_id
FROM (
  SELECT
    a.mmsi mmsi,
    b.seg_id seg_id,
    b.first_timestamp overalp_start,
    IF(a.last_timestamp < b.last_timestamp, a.last_timestamp, b.last_timestamp) overlap_end,
    /*a.first_timestamp,
      b.first_timestamp,
      a.last_timestamp,
      b.last_timestamp,
      IF(a.last_timestamp < b.last_timestamp, a.last_timestamp - b.first_timestamp, b.last_timestamp - b.first_timestamp)/3600000000 spoofing_hours,*/
  FROM (
    SELECT
      *
    FROM
      [scratch_david_seg_analysis.good_fishing_segments_2014] a
    JOIN
      [scratch_david_seg_analysis.good_fishing_segments_2014] b
    ON
      a.mmsi = b.mmsi)
  WHERE
    a.first_timestamp < b.first_timestamp
    AND b.first_timestamp < a.last_timestamp),
  (
  SELECT
    a.mmsi mmsi,
    a.seg_id seg_id,
    b.first_timestamp overalp_start,
    IF(a.last_timestamp < b.last_timestamp, a.last_timestamp, b.last_timestamp) overlap_end,
    /*a.first_timestamp,
      b.first_timestamp,
      a.last_timestamp,
      b.last_timestamp,
      IF(a.last_timestamp < b.last_timestamp, a.last_timestamp - b.first_timestamp, b.last_timestamp - b.first_timestamp)/3600000000 spoofing_hours,*/
  FROM (
    SELECT
      *
    FROM
      [scratch_david_seg_analysis.good_fishing_segments_2014] a
    JOIN
      [scratch_david_seg_analysis.good_fishing_segments_2014] b
    ON
      a.mmsi = b.mmsi)
  WHERE
    a.first_timestamp < b.first_timestamp
    AND b.first_timestamp < a.last_timestamp)
GROUP BY
  seg_id

Save to [scratch_david_seg_analysis.overlapping_segments_v2_2014] This lists all the segements that are overlapping in time

SELECT
  mmsi,
  seg_id,
  next_seg_id,
  timestamp,
  if (gap_seconds = 0,
    gapmeters/(.000001),
    gapmeters/gap_seconds) speed,
  gapmeters,
  gap_seconds,
FROM (
SELECT
  mmsi,
  seg_id,
  next_seg_id,
  timestamp,
  lat,
  lon,
  next_lat,
  next_lon,
  (ACOS(COS(RADIANS(90-lat)) *COS(RADIANS(90-next_lat)) +SIN(RADIANS(90-lat)) *SIN(RADIANS(90-next_lat)) * COS(RADIANS(lon-next_lon)))*6371000) gapmeters,
  (next_timestamp - timestamp)/1000000 gap_seconds
FROM (
  SELECT
    mmsi,
    seg_id,
    LEAD(seg_id, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_seg_id,
    timestamp,
    LEAD(timestamp, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_timestamp,
    lat,
    lon,
    LEAD(lat, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_lat,
    LEAD(lon, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_lon,
  FROM (
    SELECT
      mmsi,
      seg_id,
      lat,
      lon,
      timestamp,
      type
    FROM (
      SELECT
        mmsi,
        seg_id,
        lat,
        lon,
        timestamp,
        type
      FROM
        TABLE_DATE_RANGE([pipeline_classify_fishing.], TIMESTAMP('2014-01-01'), TIMESTAMP('2014-12-31')))
    WHERE
      seg_id IN (
      SELECT
        seg_id
      FROM
        [scratch_david_seg_analysis.overlapping_segments_v2_2014])
      AND type IN (1,2,3,18,19)
      AND lat IS NOT NULL
      AND lon IS NOT NULL) )
WHERE
  seg_id != next_seg_id)

Save to [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_2014] This cacluates how the vessel jumps back and forth between two segments, calculating the time and distance between two consecutive points that are in different segments.

SELECT
    mmsi,
    seg_id,
    next_seg_id,
    AVG(gap_seconds) avg_gap_seconds,
    SUM(gap_seconds) tot_gap_seconds,
    AVG(gapmeters) avg_gap_meters,
    SUM(gapmeters) tot_gap_meters,
    IF(SUM(gap_seconds)=0, if (avg_gap_meters < 10000, 10,100000), AVG(gapmeters)/AVG(gap_seconds)) avg_speed,
    COUNT(*) jumps
  FROM
    [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_2014]
  WHERE
    speed IS NOT NULL
    AND gapmeters >=0
  GROUP BY
    mmsi,
    seg_id,
    next_seg_id
  ORDER BY
    avg_speed ASC

Save to [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_2014]

SELECT
  seg_id,
  next_seg_id,
  tot_gap_seconds,
  tot_gap_meters,
  tot_jumps,
  tot_gap_meters/tot_gap_seconds avg_speed
FROM (
  SELECT
    IF(a.seg_id > a.next_seg_id, a.seg_id, a.next_seg_id) seg_id,
    IF(a.seg_id > a.next_seg_id, a.next_seg_id, a.seg_id) next_seg_id,
    a.tot_gap_seconds + IF(b.tot_gap_seconds IS NULL, 0,b.tot_gap_seconds) tot_gap_seconds,
    a.tot_gap_meters + IF(b.tot_gap_meters IS NULL, 0,b.tot_gap_meters) tot_gap_meters,
    a.jumps + IF(b.jumps IS NULL, 0,b.jumps) tot_jumps,
  FROM (
    SELECT
      *
    FROM
      [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_2014] a
    LEFT JOIN
      [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_2014] b
    ON
      a.seg_id = b.next_seg_id
      AND a.next_seg_id = b.seg_id)
  GROUP BY
    seg_id,
    next_seg_id,
    tot_gap_seconds,
    tot_gap_meters,
    tot_jumps )

Save to [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_b_2014]

The results of these two queries are a list of pairs of segements and the average distance between points on them that are consecutive in time.

3) Sum the overlapping time by vessel, including only the segements above that are physically impossible (in this case where the aveage speed has to be >100 knots in order to jump between the two segments). This is done below.


In [2]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import matplotlib
from matplotlib import colors,colorbar
%matplotlib inline
import csv 
import math
import bq
import time
import cPickle 
from mpl_toolkits.basemap import Basemap

In [4]:
client = bq.Client.Get()
def Query(q):
    t0 = time.time()
    answer = client.ReadTableRows(client.Query(q)['configuration']['query']['destinationTable'])
    print 'Query time: ' + str(time.time() - t0) + ' seconds.'
    return answer

In [5]:
# This query looks at only 2015, and only fishing vessels

q = '''
SELECT
  a.mmsi mmsi,
  ROUND(a.spoofing_hours,1) spoofing_hours,
  ROUND(b.total_hours,1) total_hours,
  ROUND(100*a.spoofing_hours/b.total_hours,2) spoofing_percent,
  a.overlapping_segments overlapping_segments,
  b.num_segments num_segments
FROM (
  SELECT
    mmsi,
    SUM(spoofing_hours) spoofing_hours,
    COUNT(*) overlapping_segments
  FROM (
    SELECT
      *,
      CONCAT(seg_id, next_seg_id) segs
    FROM (
      SELECT
        a.mmsi mmsi,
        IF(a.seg_id > b.seg_id, a.seg_id, b.seg_id) seg_id,
        IF(a.seg_id > b.seg_id, b.seg_id, a.seg_id) next_seg_id,
        IF(a.last_timestamp < b.last_timestamp, a.last_timestamp - b.first_timestamp, b.last_timestamp - b.first_timestamp)/3600000000 spoofing_hours,
      FROM (
        SELECT
          *
        FROM
          [scratch_david_seg_analysis.good_fishing_segments_2014] a
        JOIN
          [scratch_david_seg_analysis.good_fishing_segments_2014] b
        ON
          a.mmsi = b.mmsi)
      WHERE
        a.first_timestamp < b.first_timestamp
        AND b.first_timestamp < a.last_timestamp)
    HAVING
      segs IN (
      SELECT
        CONCAT(seg_id, next_seg_id) segs
      FROM
        [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_b_2014]
      WHERE
        avg_speed > 100))
  GROUP BY
    mmsi
  ORDER BY
    spoofing_hours DESC ) a
JOIN (
  SELECT
    mmsi,
    SUM((last_timestamp - first_timestamp))/3600000000 total_hours,
    COUNT(*) num_segments
  FROM
    [scratch_david_seg_analysis.good_fishing_segments_2015]
  GROUP BY
    mmsi ) b
ON
  a.mmsi = b.mmsi
ORDER BY
  spoofing_hours DESC'''

overlaps = Query(q)


Waiting on bqjob_r2bf70ce12a8ad21d_000001547c413f2d_1 ... (7s) Current status: DONE   
Query time: 10.7970309258 seconds.

In [6]:
print "mmsi spoofing_hours total_hours spoofing_Percent overlapping_segments"
for i in range(len(overlaps)):
    o = overlaps[i]
    print o[0],o[1],o[2],o[3]+"%", o[4]


mmsi spoofing_hours total_hours spoofing_Percent overlapping_segments
123456789 148422.0 85481.0 173.63% 3962
440773000 8636.7 10622.4 81.31% 35
316012880 5669.3 6737.9 84.14% 147
440787000 5054.7 12322.9 41.02% 147
367061930 3451.8 14759.5 23.39% 58
701007014 1432.4 14898.9 9.61% 41
412420502 1215.0 9962.9 12.2% 26
413222000 987.6 7300.3 13.53% 9
412888887 870.8 1352.1 64.41% 17
701006034 842.7 7703.8 10.94% 17
412331035 817.8 3760.7 21.75% 14
441072000 803.6 8752.6 9.18% 13
412400155 782.5 7387.5 10.59% 12
345140500 671.3 7295.4 9.2% 30
412448626 578.3 1570.4 36.82% 12
257303420 509.1 5181.6 9.83% 2
412447093 499.3 5098.4 9.79% 10
412412444 476.0 3946.2 12.06% 19
412420197 458.5 9600.9 4.78% 24
412201233 412.1 2439.2 16.9% 11
238123456 399.7 8127.3 4.92% 9
412413353 398.6 2418.0 16.49% 6
412320033 386.1 4777.7 8.08% 12
412420168 374.6 4665.9 8.03% 12
413666661 332.6 3020.7 11.01% 10
416004507 326.2 6777.9 4.81% 10
412449961 305.0 3927.6 7.77% 9
800000029 300.7 454.3 66.18% 5
900300308 300.7 902.0 33.34% 6
412326617 256.7 791.8 32.42% 13
412001568 246.1 783.6 31.41% 5
412421701 241.6 8606.0 2.81% 8
412422352 241.0 5154.7 4.68% 5
416000004 223.6 2788.7 8.02% 3
789899988 222.1 151.5 146.56% 6
235069008 220.6 7561.7 2.92% 10
412415282 217.1 2607.1 8.33% 6
413061986 214.8 1708.7 12.57% 3
412448771 210.0 2247.4 9.35% 5
412025120 207.3 2636.8 7.86% 4
441251000 187.1 8328.2 2.25% 4
412440151 186.1 3565.2 5.22% 8
412004128 186.1 1586.8 11.73% 3
412010156 184.3 1471.3 12.53% 6
412443779 183.1 2924.9 6.26% 5
412205637 181.2 2044.1 8.86% 11
412450287 180.6 2465.9 7.32% 3
200000051 179.4 2722.2 6.59% 5
412420163 175.7 5300.4 3.32% 5
412001888 167.3 609.2 27.46% 4
412449889 166.8 749.8 22.25% 6
412328237 164.4 491.5 33.45% 3
412420809 159.2 8075.6 1.97% 5
412349042 139.4 859.7 16.22% 6
725003670 138.4 8604.6 1.61% 4
212121212 136.5 1195.4 11.42% 6
440125190 130.2 3997.1 3.26% 4
412448839 130.1 1395.0 9.33% 4
800033690 129.7 2494.5 5.2% 3
412449914 123.9 2841.5 4.36% 4
412320023 122.5 6246.6 1.96% 6
503595000 122.4 2750.8 4.45% 1
412333567 120.2 1457.1 8.25% 1
412420331 117.2 7200.8 1.63% 6
412410675 116.4 2653.3 4.39% 6
412482028 110.7 674.6 16.41% 3
412213456 107.0 1234.9 8.67% 3
412440077 106.7 624.6 17.08% 1
412424491 99.6 5796.2 1.72% 3
412449946 99.2 2807.2 3.53% 2
413999666 96.5 1100.4 8.77% 3
412430362 95.4 2796.5 3.41% 4
257405320 92.7 5958.3 1.56% 3
412444545 91.7 3173.7 2.89% 3
701000680 89.5 2910.0 3.07% 4
412449817 86.5 2947.7 2.93% 3
412412222 84.9 1296.5 6.55% 3
412324222 84.7 1261.2 6.72% 3
412469999 82.0 1588.2 5.16% 1
412438724 82.0 1376.5 5.96% 2
412555588 81.1 2215.4 3.66% 2
200000037 81.0 311.6 25.98% 2
412349156 80.9 2615.4 3.09% 5
412323927 79.1 731.0 10.83% 10
412450061 78.4 3445.9 2.28% 2
412440000 78.2 3543.7 2.21% 2
770576117 77.1 9040.0 0.85% 7
412475392 76.4 488.8 15.63% 1
412444965 74.5 3511.9 2.12% 5
225374000 72.9 8840.0 0.82% 3
800036789 70.4 2378.7 2.96% 3
412423251 69.2 3959.8 1.75% 3
800019462 68.8 1532.6 4.49% 3
412205402 68.3 1639.1 4.17% 4
800003372 68.2 2376.1 2.87% 3
900403310 68.0 678.3 10.02% 4
412435964 64.9 455.4 14.26% 1
412437731 64.5 2669.2 2.42% 3
900401208 64.4 395.7 16.28% 1
412201512 64.4 178.4 36.11% 3
247120500 63.5 2315.3 2.74% 1
412470191 62.2 2082.9 2.99% 2
412678050 61.4 9539.9 0.64% 2
412427858 59.0 3889.5 1.52% 2
412328786 58.7 8910.0 0.66% 2
412300088 57.9 32.4 178.77% 1
512085000 57.5 8848.9 0.65% 1
412401220 57.5 4776.5 1.2% 2
926000028 57.3 1019.8 5.62% 1
412325431 57.0 612.0 9.32% 3
412449638 56.8 2699.2 2.1% 3
701000645 56.2 6328.7 0.89% 2
412205577 55.8 1148.0 4.86% 2
412435798 55.2 520.6 10.59% 1
251116940 53.6 349.6 15.32% 1
412004569 51.4 1412.9 3.64% 2
412369868 51.1 3090.1 1.65% 1
800031309 51.0 3105.8 1.64% 1
412413018 48.9 114.7 42.67% 3
412323556 48.8 232.1 21.04% 2
412438813 48.5 2146.3 2.26% 2
412358077 48.5 362.0 13.39% 1
224587000 48.5 2988.6 1.62% 1
412433937 48.4 2015.9 2.4% 1
412202222 45.7 7098.0 0.64% 1
412420934 45.6 7349.5 0.62% 1
412420854 45.1 8703.8 0.52% 2
998877000 44.0 1532.6 2.87% 2
412025012 44.0 956.5 4.6% 4
412212945 43.8 987.8 4.44% 2
412320048 42.5 533.1 7.98% 1
412449701 42.2 710.5 5.94% 1
412413553 40.0 452.7 8.84% 1
412418767 40.0 1297.8 3.08% 2
412430028 38.7 1482.6 2.61% 2
234781000 38.6 7109.1 0.54% 1
251392000 38.5 8120.4 0.47% 3
412201131 38.2 468.7 8.16% 1
412449218 37.2 1042.5 3.57% 1
412213679 36.6 906.7 4.04% 2
271072131 36.0 772.8 4.65% 1
412449096 35.9 2843.0 1.26% 1
412466666 35.9 5140.8 0.7% 3
366245070 35.7 3425.2 1.04% 2
412449632 35.6 1958.8 1.82% 2
412256789 34.5 991.6 3.48% 1
412412223 32.1 1280.8 2.51% 1
412320397 32.0 3693.7 0.87% 2
412320898 31.5 1597.2 1.97% 1
412443945 31.4 512.5 6.13% 1
412325423 31.2 2170.4 1.44% 4
412413948 30.6 1791.1 1.71% 1
235004129 30.3 5270.2 0.57% 1
412439068 30.3 2881.9 1.05% 1
413887799 30.1 1581.8 1.9% 1
412214567 29.5 792.3 3.73% 3
412442015 29.3 660.7 4.44% 2
701000880 29.1 8753.4 0.33% 2
412364651 28.5 523.4 5.44% 2
412325694 28.1 1267.5 2.22% 2
412450031 27.9 2657.1 1.05% 3
412449965 27.5 2052.2 1.34% 1
273327440 27.3 5750.8 0.48% 1
800003397 27.0 232.3 11.64% 1
412449975 26.3 702.0 3.74% 2
261005080 26.0 3061.6 0.85% 1
412025038 25.7 1192.1 2.15% 1
412321231 24.7 197.8 12.49% 3
412422583 24.6 1710.9 1.44% 2
412329607 24.4 576.6 4.23% 1
412001508 24.2 1437.6 1.68% 2
412437934 23.6 769.7 3.07% 2
412422693 23.3 3746.4 0.62% 4
412442049 23.2 1295.7 1.79% 1
412202056 22.0 807.8 2.72% 1
800037085 21.9 1854.5 1.18% 1
900403180 21.7 122.7 17.7% 1
412025066 21.1 2465.8 0.85% 2
412000122 20.9 175.4 11.93% 2
440654000 20.4 8760.1 0.23% 1
412437086 20.4 484.3 4.22% 1
412005767 19.8 984.0 2.01% 2
412200008 19.6 920.0 2.13% 1
412025317 18.8 2017.3 0.93% 1
412324259 18.4 350.1 5.26% 1
416002232 18.3 6232.4 0.29% 1
412412165 16.2 1082.8 1.5% 1
251408110 16.2 8772.2 0.18% 1
412325298 15.9 2677.0 0.6% 2
512000089 15.5 5558.3 0.28% 1
412555589 15.2 1832.4 0.83% 1
415000019 15.1 2569.5 0.59% 1
900401128 14.3 119.9 11.94% 1
412444513 14.2 2873.6 0.49% 5
224188000 13.3 1616.3 0.82% 1
412450328 12.9 538.9 2.39% 1
412432063 12.4 2215.7 0.56% 1
245703000 12.3 8005.7 0.15% 1
900404298 12.2 260.7 4.67% 1
412325275 11.6 811.3 1.43% 1
412438888 10.9 3477.3 0.31% 1
251287110 10.7 8753.8 0.12% 1
412324496 10.4 249.0 4.19% 1
412324128 10.1 1911.7 0.53% 1
412211646 9.7 85.5 11.31% 1
251598110 9.5 7798.1 0.12% 1
701000674 9.0 3404.8 0.26% 1
412410395 8.7 1250.8 0.69% 2
412419204 8.4 386.8 2.16% 1
412330798 8.4 355.9 2.37% 2
251153000 8.1 8382.5 0.1% 1
412412311 7.9 351.6 2.24% 1
412422197 7.7 1838.3 0.42% 1
412413255 7.4 1713.3 0.43% 1
412205146 7.3 808.0 0.91% 1
412445202 7.3 2749.4 0.26% 4
412446107 7.0 1050.6 0.66% 2
150203711 6.9 3093.7 0.22% 1
412423672 6.9 2752.0 0.25% 1
412188188 6.9 1393.9 0.49% 1
412326235 6.6 540.0 1.23% 1
412323503 6.5 199.0 3.24% 1
412427888 6.3 2113.4 0.3% 1
412326129 6.2 1322.4 0.47% 2
412480368 6.1 539.0 1.13% 1
412567898 5.3 1564.8 0.34% 1
412410394 5.1 822.0 0.62% 1
412422536 4.9 675.3 0.73% 1
412428476 4.8 2397.6 0.2% 1
412325299 4.6 681.4 0.67% 2
440100970 4.6 1321.4 0.35% 1
412323876 4.6 607.3 0.75% 1
412442237 4.5 937.7 0.48% 1
412413049 4.3 2150.0 0.2% 1
926011072 3.9 140.3 2.76% 1
412412004 3.7 985.5 0.38% 1
412322699 3.5 813.4 0.43% 1
800000331 3.4 613.8 0.55% 1
412433676 3.3 1011.6 0.33% 1
800004542 3.2 256.2 1.23% 2
412324962 2.7 435.1 0.62% 1
412443946 2.6 748.8 0.35% 1
800016789 2.5 670.0 0.37% 1
235002051 2.5 2659.9 0.09% 1
412325345 2.4 551.4 0.43% 1
412420455 2.2 2767.2 0.08% 1
412321098 2.1 425.1 0.48% 1
412321464 2.1 3338.8 0.06% 3
412442741 1.9 230.5 0.81% 1
251141110 1.9 8758.3 0.02% 1
412256718 1.8 534.8 0.33% 1
412214299 1.7 565.2 0.29% 1
412123111 1.7 1608.6 0.11% 1
412414581 1.6 1477.5 0.11% 1
367070360 1.6 8056.9 0.02% 1
412320038 1.6 2322.1 0.07% 1
412461618 1.2 321.4 0.37% 2
412323982 1.2 1412.0 0.08% 1
412451066 1.1 882.3 0.13% 1
412460888 1.1 626.9 0.18% 1
412206929 1.1 188.4 0.57% 1
412478293 0.9 1331.7 0.07% 2
251156000 0.9 8757.8 0.01% 1
400000012 0.7 2722.7 0.03% 1
412223456 0.7 1372.4 0.05% 1
412324201 0.5 815.3 0.07% 1
412205868 0.4 680.9 0.06% 1
416002923 0.2 6615.1 0.0% 1
412449123 0.2 826.2 0.03% 1
412445104 0.1 1555.0 0.01% 1
760999991 0.1 2790.2 0.0% 1

In [7]:
print "total number of vessels with at least one overlap:", len(overlaps)


total number of vessels with at least one overlap: 271

In [8]:
for i in range(len(overlaps)):
    o = overlaps[i]
    if o[0] == '258154000':
        print o

Let's map some of these


In [11]:
def map_mmsi(mmsi, title = ""):
    if title == "":
        title = 'Vessel '+str(mmsi)
    plt.rcParams["figure.figsize"] = [12,7]
    q = ''' select * from (select timestamp, lat, lon, speed, seg_id 
    FROM (TABLE_DATE_RANGE([pipeline_classify.], TIMESTAMP('2014-01-01'), TIMESTAMP('2014-12-31')))
    where mmsi = '''+str(mmsi) + '''
    and lat is not null and lon is not null 
    and lat <90 and lat>-90 and lon<180 and lon>-180)
    where seg_id in (select seg_id from [scratch_david_seg_analysis.good_fishing_segments_2014]
    where mmsi ='''+str(mmsi) + ''' )
    order by timestamp'''

    ves = Query(q)

    lats = np.array([float(v[1]) for v in ves])
    lons = np.array([float(v[2]) for v in ves])

    min_lat = min(lats)
    min_lon = min(lons)
    max_lat = max(lats)
    max_lon = max(lons)

    if max_lat>90: max_lat = 90
    if min_lat<-90: min_lat = -90

    m =\
    Basemap(llcrnrlon=min_lon,llcrnrlat=min_lat,urcrnrlon=max_lon,urcrnrlat=max_lat,projection='mill', resolution ='l')

    m.drawmapboundary()
    m.fillcontinents("#cccccc")
    x,y= m(lons,lats)
    color =  "#7879FA"   
    size = 2
    m.plot(x, y, lw=1, color = "#000000",alpha=.5)
    m.plot(x, y, marker = 'o', alpha=.5,lw=0, ms=5,markeredgecolor = 'none',color = "#FF0000")
    plt.title(title)   
    plt.savefig(str(mmsi)+".png",bbox_inches='tight',dpi=900,transparent=True,pad_inches=.1)
    plt.show()
    plt.clf()

In [12]:
for i in range(50,len(overlaps),25): # no need to start on the obvious spoofing one
    o = overlaps[i]
    map_mmsi(o[0], "Vessel "+o[0] + ", "+o[1]+" hours spoofing, "+o[3]+"% time spoofing, "+o[4]+" overlapping segs: ")


Waiting on bqjob_r2ec8b9fec8724966_000001547c579e05_3 ... (23s) Current status: DONE   
Query time: 27.6318910122 seconds.
Waiting on bqjob_r5666a2e984bca7aa_000001547c58560c_4 ... (23s) Current status: DONE   
Query time: 28.3994309902 seconds.
Waiting on bqjob_r57999a1e7fb415dc_000001547c590f6c_5 ... (14s) Current status: DONE   
Query time: 19.0840349197 seconds.
Waiting on bqjob_r54eca88177cb1f3f_000001547c59926e_6 ... (23s) Current status: DONE   
Query time: 28.41522789 seconds.
Waiting on bqjob_r2b1cdfef38cf259b_000001547c5a43fa_7 ... (15s) Current status: DONE   
Query time: 19.6467740536 seconds.
Waiting on bqjob_r721687483a980c69_000001547c5adf8e_8 ... (23s) Current status: DONE   
Query time: 27.2707209587 seconds.
Waiting on bqjob_r7d96f3913fa32acd_000001547c5b9796_9 ... (135s) Current status: DONE   
Query time: 140.469916105 seconds.
Waiting on bqjob_r58e5e5afe901243f_000001547c5e0379_10 ... (23s) Current status: DONE   
Query time: 27.4106340408 seconds.
Waiting on bqjob_r89f12b7a57a0f28_000001547c5eb48c_11 ... (14s) Current status: DONE   
Query time: 19.5673210621 seconds.
<matplotlib.figure.Figure at 0x103d5cc90>

In [13]:
import csv
with open('spoofing_hours_2014.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(["mmsi","spoofing_hours","total_hours","spoofing_percent","overlapping_segments","num_segments"])
    writer.writerows(overlaps)

In [ ]: