Which MMSI are Used by Multiple Vessels?

This is the fourth take at getting a good spoofing list

1) Eliminate all segments that have fewer than 100 points and are all terrestrial, and which are not clumped right next to the equator or prime meridian. This will get rid of many valid segments, but it will also eliminate most of the noise. (A beter analysis will later get rid of more noise).

SELECT
  * from(
  SELECT
    mmsi,
    seg_id,
    first_timestamp,
    last_timestamp,
    LAG(last_timestamp,1) OVER (PARTITION BY mmsi ORDER BY first_timestamp) last_previous_segment,
    LEAD(first_timestamp,1) OVER (PARTITION BY mmsi ORDER BY first_timestamp) first_next_segment
  FROM
    [scratch_david_seg_analysis.2015_segments]
  WHERE
    (point_count >=100
      OR (point_count>10
        AND terrestrial_positions != point_count))
    AND (last_timestamp - first_timestamp)/3600000000 > 12 // more than 12 hours
    AND NOT ((min_lon >= 0 // these are almost definitely noise
        AND max_lon <= 0.109225)
      OR (min_lat >= 0
        AND max_lat <= 0.109225)) )
WHERE
  mmsi IN (
  SELECT
    mmsi
  FROM
    [scratch_bjorn.2015_combined_fishing])

Save to table: [scratch_david_seg_analysis.good_fishing_segments_2015]

2) Look at segments that overlap, and ignore the ones that should not be brocken up into different segments -- that is, look at the average distance between consecutive points, and determine if they are realistic segments. We have to eliminate these because it looks like the segmenter was dividing what should be continuous segments into different segments, which in turn made some vessels have overlap which clearly did not have overlap.

SELECT
  seg_id
FROM (
  SELECT
    a.mmsi mmsi,
    b.seg_id seg_id,
    b.first_timestamp overalp_start,
    IF(a.last_timestamp < b.last_timestamp, a.last_timestamp, b.last_timestamp) overlap_end,
    /*a.first_timestamp,
      b.first_timestamp,
      a.last_timestamp,
      b.last_timestamp,
      IF(a.last_timestamp < b.last_timestamp, a.last_timestamp - b.first_timestamp, b.last_timestamp - b.first_timestamp)/3600000000 spoofing_hours,*/
  FROM (
    SELECT
      *
    FROM
      [scratch_david_seg_analysis.good_fishing_segments_2015] a
    JOIN
      [scratch_david_seg_analysis.good_fishing_segments_2015] b
    ON
      a.mmsi = b.mmsi)
  WHERE
    a.first_timestamp < b.first_timestamp
    AND b.first_timestamp < a.last_timestamp),
  (
  SELECT
    a.mmsi mmsi,
    a.seg_id seg_id,
    b.first_timestamp overalp_start,
    IF(a.last_timestamp < b.last_timestamp, a.last_timestamp, b.last_timestamp) overlap_end,
    /*a.first_timestamp,
      b.first_timestamp,
      a.last_timestamp,
      b.last_timestamp,
      IF(a.last_timestamp < b.last_timestamp, a.last_timestamp - b.first_timestamp, b.last_timestamp - b.first_timestamp)/3600000000 spoofing_hours,*/
  FROM (
    SELECT
      *
    FROM
      [scratch_david_seg_analysis.good_fishing_segments_2015] a
    JOIN
      [scratch_david_seg_analysis.good_fishing_segments_2015] b
    ON
      a.mmsi = b.mmsi)
  WHERE
    a.first_timestamp < b.first_timestamp
    AND b.first_timestamp < a.last_timestamp)
GROUP BY
  seg_id

Save to [scratch_david_seg_analysis.overlapping_segments_v2] This lists all the segements that are overlapping in time

SELECT
  mmsi,
  seg_id,
  next_seg_id,
  timestamp,
  if (gap_seconds = 0,
    gapmeters/(.000001),
    gapmeters/gap_seconds) speed,
  gapmeters,
  gap_seconds,
FROM (
SELECT
  mmsi,
  seg_id,
  next_seg_id,
  timestamp,
  lat,
  lon,
  next_lat,
  next_lon,
  (ACOS(COS(RADIANS(90-lat)) *COS(RADIANS(90-next_lat)) +SIN(RADIANS(90-lat)) *SIN(RADIANS(90-next_lat)) * COS(RADIANS(lon-next_lon)))*6371000) gapmeters,
  (next_timestamp - timestamp)/1000000 gap_seconds
FROM (
  SELECT
    mmsi,
    seg_id,
    LEAD(seg_id, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_seg_id,
    timestamp,
    LEAD(timestamp, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_timestamp,
    lat,
    lon,
    LEAD(lat, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_lat,
    LEAD(lon, 1) OVER (PARTITION BY mmsi ORDER BY timestamp) next_lon,
  FROM (
    SELECT
      mmsi,
      seg_id,
      lat,
      lon,
      timestamp,
      type
    FROM (
      SELECT
        mmsi,
        seg_id,
        lat,
        lon,
        timestamp,
        type
      FROM
        TABLE_DATE_RANGE([pipeline_classify_fishing.], TIMESTAMP('2015-01-01'), TIMESTAMP('2015-12-31')))
    WHERE
      seg_id IN (
      SELECT
        seg_id
      FROM
        [scratch_david_seg_analysis.overlapping_segments_v2])
      AND type IN (1,2,3,18,19)
      AND lat IS NOT NULL
      AND lon IS NOT NULL) )
WHERE
  seg_id != next_seg_id)

Save to [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps] This cacluates how the vessel jumps back and forth between two segments, calculating the time and distance between two consecutive points that are in different segments.

SELECT
    mmsi,
    seg_id,
    next_seg_id,
    AVG(gap_seconds) avg_gap_seconds,
    SUM(gap_seconds) tot_gap_seconds,
    AVG(gapmeters) avg_gap_meters,
    SUM(gapmeters) tot_gap_meters,
    IF(SUM(gap_seconds)=0, if (avg_gap_meters < 10000, 10,100000), AVG(gapmeters)/AVG(gap_seconds)) avg_speed,
    COUNT(*) jumps
  FROM
    [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps]
  WHERE
    speed IS NOT NULL
    AND gapmeters >=0
  GROUP BY
    mmsi,
    seg_id,
    next_seg_id
  ORDER BY
    avg_speed ASC

Save to [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats]

SELECT
  seg_id,
  next_seg_id,
  tot_gap_seconds,
  tot_gap_meters,
  tot_jumps,
  tot_gap_meters/tot_gap_seconds avg_speed
FROM (
  SELECT
    IF(a.seg_id > a.next_seg_id, a.seg_id, a.next_seg_id) seg_id,
    IF(a.seg_id > a.next_seg_id, a.next_seg_id, a.seg_id) next_seg_id,
    a.tot_gap_seconds + IF(b.tot_gap_seconds IS NULL, 0,b.tot_gap_seconds) tot_gap_seconds,
    a.tot_gap_meters + IF(b.tot_gap_meters IS NULL, 0,b.tot_gap_meters) tot_gap_meters,
    a.jumps + IF(b.jumps IS NULL, 0,b.jumps) tot_jumps,
  FROM (
    SELECT
      *
    FROM
      [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats] a
    LEFT JOIN
      [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats] b
    ON
      a.seg_id = b.next_seg_id
      AND a.next_seg_id = b.seg_id)
  GROUP BY
    seg_id,
    next_seg_id,
    tot_gap_seconds,
    tot_gap_meters,
    tot_jumps )

Save to [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_b]

The results of these two queries are a list of pairs of segements and the average distance between points on them that are consecutive in time.

3) Sum the overlapping time by vessel, including only the segements above that are physically impossible (in this case where the aveage speed has to be >100 knots in order to jump between the two segments). This is done below.


In [2]:
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap
import matplotlib
from matplotlib import colors,colorbar
%matplotlib inline
import csv 
import math
import bq
import time
import cPickle 
from mpl_toolkits.basemap import Basemap

In [3]:
client = bq.Client.Get()
def Query(q):
    t0 = time.time()
    answer = client.ReadTableRows(client.Query(q)['configuration']['query']['destinationTable'])
    print 'Query time: ' + str(time.time() - t0) + ' seconds.'
    return answer

In [4]:
# This query looks at only 2015, and only fishing vessels

q = '''
SELECT
  a.mmsi mmsi,
  ROUND(a.spoofing_hours,1) spoofing_hours,
  ROUND(b.total_hours,1) total_hours,
  ROUND(100*a.spoofing_hours/b.total_hours,2) spoofing_percent,
  a.overlapping_segments overlapping_segments,
  b.num_segments num_segments
FROM (
  SELECT
    mmsi,
    SUM(spoofing_hours) spoofing_hours,
    COUNT(*) overlapping_segments
  FROM (
    SELECT
      *,
      CONCAT(seg_id, next_seg_id) segs
    FROM (
      SELECT
        a.mmsi mmsi,
        IF(a.seg_id > b.seg_id, a.seg_id, b.seg_id) seg_id,
        IF(a.seg_id > b.seg_id, b.seg_id, a.seg_id) next_seg_id,
        IF(a.last_timestamp < b.last_timestamp, a.last_timestamp - b.first_timestamp, b.last_timestamp - b.first_timestamp)/3600000000 spoofing_hours,
      FROM (
        SELECT
          *
        FROM
          [scratch_david_seg_analysis.good_fishing_segments_2015] a
        JOIN
          [scratch_david_seg_analysis.good_fishing_segments_2015] b
        ON
          a.mmsi = b.mmsi)
      WHERE
        a.first_timestamp < b.first_timestamp
        AND b.first_timestamp < a.last_timestamp)
    HAVING
      segs IN (
      SELECT
        CONCAT(seg_id, next_seg_id) segs
      FROM
        [scratch_david_seg_analysis.overlapping_segments_v2_pointjumps_stats_b]
      WHERE
        avg_speed > 100))
  GROUP BY
    mmsi
  ORDER BY
    spoofing_hours DESC ) a
JOIN (
  SELECT
    mmsi,
    SUM((last_timestamp - first_timestamp))/3600000000 total_hours,
    COUNT(*) num_segments
  FROM
    [scratch_david_seg_analysis.good_fishing_segments_2015]
  GROUP BY
    mmsi ) b
ON
  a.mmsi = b.mmsi
ORDER BY
  spoofing_hours DESC'''

overlaps = Query(q)


Waiting on bqjob_r34289ec091ad6f61_0000015473276fb6_1 ... (0s) Current status: DONE   
Query time: 2.05902695656 seconds.

In [5]:
print "mmsi spoofing_hours total_hours spoofing_Percent overlapping_segments"
for i in range(len(overlaps)):
    o = overlaps[i]
    print o[0],o[1],o[2],o[3]+"%", o[4]


mmsi spoofing_hours total_hours spoofing_Percent overlapping_segments
123456789 395706.1 85481.0 462.92% 8439
222222222 65731.9 35657.9 184.34% 1886
701007014 5481.0 14898.9 36.79% 85
354622000 5191.3 13274.7 39.11% 63
412331079 4431.5 12756.8 34.74% 47
440787000 3559.0 12322.9 28.88% 101
412333333 3134.4 10122.5 30.96% 73
412420405 2682.0 11301.9 23.73% 23
367061930 2613.2 14759.5 17.71% 104
412420502 2556.6 9962.9 25.66% 59
440773000 1945.4 10622.4 18.31% 104
412200146 1426.1 6681.3 21.34% 26
412400155 1391.1 7387.5 18.83% 13
412320023 1337.4 6246.6 21.41% 18
440100000 1129.0 9856.1 11.45% 13
416002477 1104.2 7310.7 15.1% 9
345050700 1085.1 9184.6 11.81% 49
273892110 1043.5 9635.9 10.83% 24
312854000 997.1 7691.0 12.96% 17
345140500 961.6 7295.4 13.18% 84
412466666 941.1 5140.8 18.31% 20
701006034 874.9 7703.8 11.36% 8
412447093 872.7 5098.4 17.12% 18
413222000 836.2 7300.3 11.45% 7
412420197 820.8 9600.9 8.55% 20
412678050 812.2 9539.9 8.51% 12
257222222 795.8 3728.5 21.34% 2
273828110 754.5 5426.0 13.9% 15
412202222 736.5 7098.0 10.38% 13
245449000 722.6 8480.8 8.52% 27
271072246 632.0 3616.0 17.48% 7
412686020 610.4 3492.0 17.48% 8
238123456 603.3 8127.3 7.42% 14
412420944 601.2 8219.4 7.31% 15
366270000 596.7 9015.9 6.62% 12
412420956 583.1 4204.3 13.87% 9
412412444 558.3 3946.2 14.15% 19
800033080 546.1 4034.0 13.54% 6
416002251 468.1 2148.0 21.79% 4
412422352 465.6 5154.7 9.03% 12
312996000 441.7 5427.7 8.14% 3
412412442 428.4 2897.4 14.79% 7
512085000 428.2 8848.9 4.84% 5
416003451 427.1 8621.5 4.95% 3
412328786 422.3 8910.0 4.74% 13
367411970 407.7 7832.2 5.21% 8
412523126 333.3 3018.5 11.04% 10
353038000 318.4 3426.1 9.29% 4
412430362 304.2 2796.5 10.88% 5
412444555 300.2 2788.5 10.76% 5
367647210 296.3 8365.2 3.54% 2
412331078 282.0 6688.1 4.22% 6
770576062 251.2 2688.3 9.34% 5
316012880 249.9 6737.9 3.71% 5
246439000 243.5 8446.8 2.88% 7
412320033 239.0 4777.7 5.0% 5
760000770 238.5 2228.5 10.7% 22
627066000 236.0 4210.2 5.61% 1
416000205 234.8 2587.5 9.08% 6
412050505 227.3 2904.6 7.82% 4
412440000 225.5 3543.7 6.36% 5
440125190 220.7 3997.1 5.52% 8
244977000 218.2 8798.0 2.48% 4
412448798 213.3 1453.8 14.67% 3
200005566 202.0 3456.8 5.84% 4
412444545 200.4 3173.7 6.31% 5
412400057 197.6 1994.6 9.9% 2
412443675 179.0 1460.7 12.25% 8
412010156 175.8 1471.3 11.95% 1
416000005 167.6 5244.5 3.2% 9
366701650 164.6 3073.0 5.36% 3
412002358 157.9 2433.7 6.49% 3
366993020 152.4 6459.9 2.36% 3
412420504 151.4 8079.1 1.87% 3
412358888 146.4 5270.0 2.78% 6
800033690 140.8 2494.5 5.65% 4
245703000 139.4 8005.7 1.74% 7
412435126 136.9 2688.3 5.09% 3
800123456 121.6 2222.4 5.47% 3
412333888 121.4 3014.9 4.03% 3
412400001 119.6 1401.2 8.53% 2
800043433 119.0 1256.7 9.47% 2
412070707 116.9 2639.7 4.43% 5
412420168 113.0 4665.9 2.42% 2
412449305 112.2 2832.6 3.96% 4
412000060 110.3 770.5 14.31% 1
416000004 109.5 2788.7 3.93% 3
263407550 109.4 5268.7 2.08% 5
413270430 109.3 4546.3 2.4% 17
200000012 105.3 2370.2 4.44% 2
412435251 104.9 1977.5 5.3% 2
412421701 104.4 8606.0 1.21% 3
926012630 104.1 614.1 16.95% 2
412449632 102.1 1958.8 5.21% 3
412438888 102.1 3477.3 2.94% 6
412448599 101.2 1660.5 6.1% 4
412025038 98.7 1192.1 8.28% 1
412449972 94.6 2652.9 3.57% 4
412006688 94.4 1332.6 7.08% 2
366996880 91.7 8160.9 1.12% 5
412326877 90.7 6185.9 1.47% 4
412354269 89.9 7728.2 1.16% 6
412369061 87.7 1253.6 7.0% 2
800037101 87.5 2279.3 3.84% 3
412421356 86.3 7647.8 1.13% 4
412422832 86.0 2207.3 3.89% 2
412420163 83.9 5300.4 1.58% 4
413056789 83.4 831.1 10.04% 2
412413049 83.2 2150.0 3.87% 1
412002348 80.9 1924.0 4.2% 4
412478293 78.7 1331.7 5.91% 2
416167900 78.7 8010.6 0.98% 1
412421421 78.3 3930.6 1.99% 5
412449967 77.4 1572.6 4.92% 2
412420279 75.1 5446.6 1.38% 3
367420490 74.9 8017.5 0.93% 6
412415282 73.6 2607.1 2.82% 1
412025066 72.2 2465.8 2.93% 6
412448626 70.8 1570.4 4.51% 3
367338850 70.7 8492.5 0.83% 4
205161000 70.5 8249.2 0.85% 2
416004319 70.1 6085.8 1.15% 5
273416010 68.7 6160.4 1.11% 4
900022212 67.7 561.6 12.05% 1
412205541 67.4 2543.6 2.65% 4
247120160 67.0 2882.9 2.32% 1
412449645 65.7 3233.5 2.03% 2
367687870 65.3 500.2 13.04% 1
412188188 63.7 1393.9 4.57% 3
997753357 63.5 887.5 7.16% 1
412365021 62.7 2359.6 2.66% 3
412324763 62.7 386.1 16.25% 1
416000029 62.1 3919.9 1.59% 1
412410694 61.9 2831.5 2.18% 3
412213988 61.1 1388.5 4.4% 2
800016013 59.0 1152.0 5.12% 1
431704490 58.7 7401.7 0.79% 1
412436928 58.7 1630.9 3.6% 1
412448835 58.2 2145.7 2.71% 2
412432909 58.0 300.0 19.34% 1
367561870 57.9 7835.0 0.74% 3
412450349 57.9 2801.5 2.07% 2
416240800 57.5 8349.3 0.69% 1
412450061 57.3 3445.9 1.66% 2
412370008 56.7 4061.9 1.4% 2
800005678 56.4 1987.0 2.84% 4
412440499 55.7 4336.9 1.28% 1
412320038 55.1 2322.1 2.37% 3
200000288 54.9 866.5 6.33% 1
367143710 54.5 8236.5 0.66% 2
412555588 54.1 2215.4 2.44% 1
412413253 54.0 1894.9 2.85% 2
416004802 53.5 2196.5 2.44% 1
367692120 53.3 2607.6 2.04% 1
412111999 52.7 5230.9 1.01% 2
412449946 50.6 2807.2 1.8% 1
412005678 50.4 3057.2 1.65% 3
412449922 50.4 1107.4 4.55% 1
412325623 50.2 1508.7 3.33% 1
412352139 50.0 1357.3 3.68% 1
412349042 49.9 859.7 5.8% 1
263377000 49.6 4705.1 1.05% 2
412413414 49.5 1279.9 3.87% 1
412567898 48.4 1564.8 3.09% 3
412449638 47.9 2699.2 1.77% 2
412366888 45.5 2130.8 2.14% 2
251315240 45.5 15309.8 0.3% 1
412428638 44.8 2545.6 1.76% 3
367637970 44.4 8514.8 0.52% 3
412439605 43.9 8548.5 0.51% 1
412449096 43.5 2843.0 1.53% 2
412447634 42.9 3314.4 1.29% 1
259244000 42.9 7129.9 0.6% 2
412568988 42.8 1264.8 3.39% 3
412370545 41.9 744.2 5.63% 1
416000818 41.7 5784.1 0.72% 2
701006084 41.6 8408.9 0.49% 2
905101688 41.2 902.5 4.57% 1
245150000 41.0 3794.3 1.08% 1
412900356 40.8 2055.4 1.99% 2
416770000 40.5 2800.4 1.45% 4
412423522 40.0 524.4 7.63% 1
412001568 38.8 783.6 4.95% 3
412448771 38.8 2247.4 1.73% 2
412367718 38.3 1204.3 3.18% 1
412444556 38.1 1297.7 2.94% 2
800031309 37.1 3105.8 1.19% 2
259647000 36.4 7464.8 0.49% 3
412025012 35.9 956.5 3.75% 1
412325431 35.2 612.0 5.76% 1
412203447 35.2 4636.0 0.76% 1
412449578 35.1 3056.9 1.15% 3
440604000 35.1 5729.1 0.61% 2
413001486 34.8 2576.8 1.35% 1
412449914 34.8 2841.5 1.22% 3
412345666 34.6 1201.1 2.88% 2
413080089 34.5 1239.4 2.78% 1
412416187 33.6 1540.3 2.18% 2
412438868 33.6 1010.7 3.32% 2
412450101 33.1 2250.7 1.47% 2
412369856 32.6 2143.4 1.52% 2
412325298 31.6 2677.0 1.18% 3
367618990 30.6 8099.7 0.38% 1
412420561 30.5 3159.9 0.97% 2
905116766 30.5 552.4 5.53% 1
800036789 30.3 2378.7 1.27% 2
412450251 30.2 2655.0 1.14% 2
413011000 30.2 2701.8 1.12% 1
412420809 30.0 8075.6 0.37% 1
412449965 30.0 2052.2 1.46% 2
412325423 29.5 2170.4 1.36% 1
412413817 29.1 925.7 3.14% 1
412413988 29.1 1401.8 2.08% 1
431797000 29.1 6722.3 0.43% 1
416228600 27.9 650.4 4.29% 1
503023550 27.8 5651.5 0.49% 1
366916960 27.3 6334.6 0.43% 2
412488988 27.2 2164.9 1.26% 1
412449205 27.0 858.5 3.14% 1
412596222 26.7 2174.9 1.23% 1
412428476 26.6 2397.6 1.11% 1
412213823 26.6 643.8 4.14% 2
412324115 26.2 1768.5 1.48% 2
412321464 25.9 3338.8 0.78% 2
412480433 25.5 2852.4 0.89% 1
413002288 25.2 1151.9 2.19% 1
251184640 25.2 7623.2 0.33% 1
367037240 25.2 9303.1 0.27% 2
412460003 25.1 2736.5 0.92% 1
503005980 24.5 740.5 3.31% 1
412427039 23.7 3220.4 0.73% 1
412448839 23.4 1395.0 1.67% 1
412286356 23.2 2488.6 0.93% 1
820040915 22.5 2476.0 0.91% 2
412461317 22.4 3268.7 0.69% 1
412411001 22.0 2459.0 0.89% 1
412450031 21.7 2657.1 0.82% 3
412449971 21.6 1993.8 1.08% 1
224259000 21.4 6320.2 0.34% 1
412427888 21.2 2113.4 1.0% 1
412205402 21.1 1639.1 1.28% 2
412286195 20.8 1810.8 1.15% 1
412235676 20.8 214.2 9.71% 1
245282000 20.0 8775.3 0.23% 2
900000390 19.8 2540.5 0.78% 1
251162340 19.3 6338.0 0.3% 1
412433335 19.3 1339.4 1.44% 1
224065000 18.9 5668.9 0.33% 1
209212000 18.9 2747.7 0.69% 1
224011150 18.6 3463.8 0.54% 1
251083110 18.3 8770.8 0.21% 1
412596668 18.2 4142.7 0.44% 2
222256789 17.3 2277.8 0.76% 1
432395000 16.3 4437.5 0.37% 1
412449014 16.2 2054.0 0.79% 1
412283596 15.8 1328.3 1.19% 2
412213456 15.7 1234.9 1.27% 1
111111118 15.4 1662.7 0.93% 1
416002934 15.3 5795.0 0.26% 1
412449198 15.0 2155.3 0.7% 1
416234700 14.9 5990.4 0.25% 1
412331106 14.5 3478.0 0.42% 1
257200220 14.5 11993.8 0.12% 1
413778422 14.5 609.8 2.37% 1
412449817 14.4 2947.7 0.49% 2
251540540 13.8 8748.1 0.16% 1
412410675 13.7 2653.3 0.52% 2
412205577 13.5 1148.0 1.17% 1
367301390 13.2 7222.6 0.18% 1
416000199 13.0 1134.2 1.14% 1
440192000 12.7 7601.5 0.17% 1
251408110 12.5 8772.2 0.14% 1
412324201 11.5 815.3 1.42% 3
412400069 11.5 4181.5 0.28% 1
412285023 11.4 1438.7 0.79% 1
235003589 11.3 4063.3 0.28% 1
251468000 11.2 8751.5 0.13% 1
701006609 10.9 10001.6 0.11% 2
413201502 10.3 837.5 1.23% 1
412352498 9.6 449.4 2.13% 1
412215051 9.5 455.5 2.08% 1
412320088 9.4 626.6 1.5% 2
412446582 9.4 2692.3 0.35% 1
413088896 9.2 3974.5 0.23% 1
235002810 8.9 8295.4 0.11% 1
412002518 8.7 1377.5 0.63% 1
412442222 8.5 282.7 3.02% 1
412444878 8.5 2495.4 0.34% 1
412460002 8.4 5088.0 0.16% 2
412001488 8.1 1468.1 0.55% 1
412423536 7.7 1518.9 0.51% 1
412002208 7.5 743.0 1.0% 1
431000580 7.4 2806.4 0.26% 1
412469618 7.4 1390.1 0.53% 2
413811217 7.1 828.7 0.86% 1
800003372 7.0 2376.1 0.3% 1
412056888 6.6 742.8 0.89% 2
367690430 6.5 2252.1 0.29% 1
800035708 6.5 1215.6 0.53% 1
412005008 6.2 844.7 0.73% 1
412450009 6.2 2127.2 0.29% 1
412423566 6.2 3078.7 0.2% 1
416000006 6.0 3556.0 0.17% 1
760000610 6.0 1868.8 0.32% 2
413999666 5.9 1100.4 0.54% 1
412596666 5.8 1914.4 0.3% 1
412412427 5.4 1730.3 0.31% 2
412303381 5.2 1250.4 0.42% 1
316001870 5.1 5418.7 0.09% 1
412008899 4.9 829.9 0.59% 1
412432139 4.8 755.2 0.64% 1
263415290 4.7 7717.7 0.06% 1
367070360 4.7 8056.9 0.06% 1
251184840 4.7 8764.6 0.05% 1
367853000 4.7 3267.4 0.15% 1
503362900 4.6 8309.2 0.06% 2
412008002 4.6 1222.4 0.38% 1
441667000 4.6 8764.2 0.05% 1
338053768 4.6 752.7 0.61% 1
659257000 4.6 4258.8 0.11% 1
251151000 4.5 8659.0 0.05% 1
412568569 4.5 2181.1 0.2% 1
316001828 4.4 8685.5 0.05% 1
431838000 4.4 8405.2 0.05% 1
245310000 4.4 6428.6 0.07% 1
204810000 4.3 4752.8 0.09% 1
412420657 4.3 2755.7 0.16% 1
224901000 4.2 8231.1 0.05% 1
412349156 4.1 2615.4 0.16% 2
412450287 4.1 2465.9 0.17% 1
263408350 4.0 3038.6 0.13% 1
428110000 4.0 2580.0 0.15% 1
200000975 4.0 1086.4 0.36% 1
200005317 3.9 3559.3 0.11% 1
412421789 3.9 1519.5 0.25% 1
412436679 3.8 952.4 0.4% 1
412325362 3.7 729.9 0.5% 1
412033143 3.6 352.4 1.02% 1
367510820 3.6 6903.6 0.05% 1
367615170 3.5 764.1 0.45% 1
440662000 3.4 5003.4 0.07% 1
224187770 3.4 4297.0 0.08% 1
235000910 3.4 6921.0 0.05% 1
412439477 3.4 1414.0 0.24% 1
251166110 3.3 8764.2 0.04% 1
412449247 3.1 2807.5 0.11% 1
412324328 3.1 671.5 0.47% 1
412890966 3.1 664.6 0.47% 1
412002362 3.1 1986.6 0.16% 1
255402650 3.1 4641.1 0.07% 1
998877000 3.1 1532.6 0.2% 1
376602000 3.0 6339.6 0.05% 1
432507000 3.0 8048.0 0.04% 1
251848370 3.0 5656.5 0.05% 1
431801000 3.0 7528.1 0.04% 1
244309000 3.0 8417.2 0.04% 1
235050371 3.0 8441.5 0.04% 1
412320396 3.0 1849.7 0.16% 1
224134470 2.9 4502.5 0.06% 1
220088000 2.9 7914.1 0.04% 1
412325856 2.9 1002.6 0.29% 1
263401690 2.8 3315.8 0.08% 1
227118700 2.7 4893.4 0.05% 1
503660600 2.7 5466.3 0.05% 1
412325438 2.7 945.6 0.29% 1
725004140 2.6 5933.0 0.04% 1
412413099 2.5 1928.7 0.13% 1
412442033 2.5 608.4 0.41% 1
412328731 2.4 5651.2 0.04% 1
219121000 2.4 7072.4 0.03% 1
503018650 2.3 3232.9 0.07% 1
431700150 2.3 7772.3 0.03% 1
366899490 2.3 7025.1 0.03% 1
412000998 2.2 1487.9 0.15% 1
234079000 2.2 6066.2 0.04% 1
412444965 2.1 3511.9 0.06% 1
412418462 2.1 2896.1 0.07% 1
412433515 2.1 1223.0 0.17% 1
219006835 2.1 8748.5 0.02% 1
412000099 2.0 2025.7 0.1% 1
431602020 1.9 8593.4 0.02% 1
368412000 1.9 4455.5 0.04% 1
412425055 1.8 2256.2 0.08% 1
263411780 1.8 2337.8 0.08% 1
701006044 1.7 10174.4 0.02% 1
231124000 1.7 8662.2 0.02% 1
512027000 1.7 6111.2 0.03% 1
227385000 1.7 3388.2 0.05% 1
251269000 1.6 8751.3 0.02% 1
412450341 1.6 1407.4 0.11% 1
259758000 1.6 8748.6 0.02% 1
226177000 1.6 7336.6 0.02% 1
234690000 1.6 3266.3 0.05% 1
231118000 1.6 7434.5 0.02% 1
412446587 1.6 323.9 0.48% 1
231116000 1.6 8727.3 0.02% 1
251227000 1.6 8616.9 0.02% 1
665111119 1.5 7285.8 0.02% 1
412420913 1.5 8813.5 0.02% 1
421123456 1.4 2568.4 0.06% 1
412368688 1.4 739.5 0.19% 1
412427858 1.4 3889.5 0.04% 1
412433192 1.3 1200.4 0.11% 1
247144390 1.3 3033.7 0.04% 1
247142250 1.3 2431.2 0.05% 1
701000778 1.2 8939.3 0.01% 1
701006037 1.2 9733.1 0.01% 1
701006214 1.2 6033.1 0.02% 1
701030000 1.1 6425.2 0.02% 1
247147420 1.1 2622.4 0.04% 1
440854000 1.0 8608.3 0.01% 1
263435000 1.0 5481.6 0.02% 1
441219253 1.0 5165.8 0.02% 1
440339000 1.0 8760.1 0.01% 1
412329692 1.0 7238.3 0.01% 1
412378999 0.9 1805.2 0.05% 1
701000687 0.9 4613.9 0.02% 1
244563000 0.9 7806.0 0.01% 1
701000660 0.9 5578.2 0.02% 1
412324128 0.9 1911.7 0.05% 1
701000617 0.9 9167.5 0.01% 1
416002923 0.9 6615.1 0.01% 1
701100000 0.9 4838.8 0.02% 1
503711500 0.9 6167.3 0.01% 1
251038000 0.9 8707.0 0.01% 1
412449943 0.9 2243.5 0.04% 1
701006068 0.9 8251.3 0.01% 1
412329686 0.9 7957.8 0.01% 1
412437731 0.9 2669.2 0.03% 1
701094000 0.9 7372.8 0.01% 1
701006005 0.9 7037.7 0.01% 1
263406570 0.9 6291.9 0.01% 1
701000786 0.9 8752.5 0.01% 1
412764850 0.7 5691.1 0.01% 1
235005510 0.7 2393.8 0.03% 1
701000928 0.7 5251.5 0.01% 1
412061979 0.7 542.7 0.13% 1
412433047 0.7 1009.3 0.07% 1
224752000 0.6 8759.7 0.01% 1
601155600 0.6 4366.2 0.01% 1
412002188 0.6 1186.8 0.05% 1
431935000 0.6 7970.5 0.01% 1
431487000 0.6 7128.2 0.01% 1
431722000 0.6 7668.4 0.01% 1
432399000 0.6 4432.7 0.01% 1
431769000 0.6 7460.7 0.01% 1
431171000 0.6 8184.8 0.01% 1
227144900 0.5 5571.7 0.01% 1
237659000 0.5 1330.2 0.04% 1
352956000 0.5 701.7 0.07% 1
412555589 0.5 1832.4 0.03% 1
273437810 0.5 6374.9 0.01% 1
416004856 0.5 6348.6 0.01% 2
412323926 0.4 1232.5 0.03% 1
659070000 0.4 6602.1 0.01% 1
261007950 0.4 11381.1 0.0% 1
412320404 0.3 684.7 0.05% 1
263407860 0.3 5759.4 0.0% 1
412205637 0.3 2044.1 0.01% 2
228106800 0.3 8197.9 0.0% 1
224180740 0.3 2599.6 0.01% 1
228239900 0.3 5140.9 0.0% 1
412349781 0.2 1868.9 0.01% 1
576736000 0.2 8089.4 0.0% 1
412000985 0.2 460.9 0.05% 1
412448806 0.2 2097.2 0.01% 1
412445202 0.1 2749.4 0.0% 1
367161930 0.1 8693.3 0.0% 1
412422693 0.1 3746.4 0.0% 1
701000674 0.1 3404.8 0.0% 1
227102900 0.1 8261.3 0.0% 1
412417558 0.1 543.2 0.02% 1
224093750 0.0 4840.1 0.0% 1
259449000 0.0 8604.4 0.0% 1
316065000 0.0 8678.7 0.0% 1
416112500 0.0 1041.5 0.0% 1
412694550 0.0 3461.3 0.0% 1
226230000 0.0 8412.9 0.0% 1
230002550 0.0 7943.4 0.0% 1
261009040 0.0 8668.4 0.0% 1

In [6]:
print "total number of vessels with at least one overlap:", len(overlaps)


total number of vessels with at least one overlap: 480

In [15]:
for i in range(len(overlaps)):
    o = overlaps[i]
    if o[0] == '258154000':
        print o

Let's map some of these


In [10]:
def map_mmsi(mmsi, title = ""):
    if title == "":
        title = 'Vessel '+str(mmsi)
    plt.rcParams["figure.figsize"] = [12,7]
    q = ''' select * from (select timestamp, lat, lon, speed, seg_id 
    FROM (TABLE_DATE_RANGE([pipeline_classify_fishing.], TIMESTAMP('2015-01-01'), TIMESTAMP('2015-12-31')))
    where mmsi = '''+str(mmsi) + '''
    and lat is not null and lon is not null 
    and lat <90 and lat>-90 and lon<180 and lon>-180)
    where seg_id in (select seg_id from [scratch_david_seg_analysis.good_fishing_segments_2015]
    where mmsi ='''+str(mmsi) + ''' )
    order by timestamp'''

    ves = Query(q)

    lats = np.array([float(v[1]) for v in ves])
    lons = np.array([float(v[2]) for v in ves])

    min_lat = min(lats)
    min_lon = min(lons)
    max_lat = max(lats)
    max_lon = max(lons)

    if max_lat>90: max_lat = 90
    if min_lat<-90: min_lat = -90

    m =\
    Basemap(llcrnrlon=min_lon,llcrnrlat=min_lat,urcrnrlon=max_lon,urcrnrlat=max_lat,projection='mill', resolution ='l')

    m.drawmapboundary()
    m.fillcontinents("#cccccc")
    x,y= m(lons,lats)
    color =  "#7879FA"   
    size = 2
    m.plot(x, y, lw=1, color = "#000000",alpha=.5)
    m.plot(x, y, marker = 'o', alpha=.5,lw=0, ms=5,markeredgecolor = 'none',color = "#FF0000")
    plt.title(title)   
    plt.savefig(str(mmsi)+".png",bbox_inches='tight',dpi=900,transparent=True,pad_inches=.1)
    plt.show()
    plt.clf()

In [16]:
# 30 hours of spoofing
map_mmsi(412420809)


Waiting on bqjob_r9c89b24725af8fe_00000154732c5381_2 ... (15s) Current status: DONE   
Query time: 19.4366979599 seconds.
<matplotlib.figure.Figure at 0x10cc86290>

In [17]:
#  30 hours of spoofing
map_mmsi(800036789)


Waiting on bqjob_r2a677eed0fc9528c_00000154732ceacd_3 ... (23s) Current status: DONE   
Query time: 27.9730279446 seconds.
<matplotlib.figure.Figure at 0x10df1bf90>

Yes, spoofing


In [18]:
# 10 hours of spoofing
map_mmsi(701006609)


Waiting on bqjob_r18beff60457a2f19_00000154732e7a41_4 ... (23s) Current status: DONE   
Query time: 40.7500221729 seconds.
<matplotlib.figure.Figure at 0x10defb790>

Doesn't look like it from this, but on closer inspection, it is spoofing

SELECT
  *
FROM (
  SELECT
    timestamp,
    lat,
    lon,
    speed,
    seg_id
  FROM (TABLE_DATE_RANGE([pipeline_classify_fishing.], TIMESTAMP('2015-01-01'), TIMESTAMP('2015-12-31')))
  WHERE
    lat IS NOT NULL
    AND lon IS NOT NULL
    AND lat <90
    AND lat>-90
    AND lon<180
    AND lon>-180)
WHERE
  seg_id IN ('701006609-2015-03-30T00:14:58.000000Z',
    '701006609-2015-03-29T16:28:42.000000Z')
ORDER BY
  timestamp

In [19]:
# 10 hours of spoofing
map_mmsi(413201502)


Waiting on bqjob_r42a06a0fb1bb1df1_00000154732fca92_5 ... (15s) Current status: DONE   
Query time: 18.4555640221 seconds.
<matplotlib.figure.Figure at 0x11d898490>

probably, but hard to tell


In [20]:
# 5 hours of spoofing
map_mmsi(263415290)


Waiting on bqjob_r76872bcd10f11bd8_00000154733de974_6 ... (15s) Current status: DONE   
Query time: 23.1419098377 seconds.
<matplotlib.figure.Figure at 0x10defb0d0>

Looks like it for just a few hours


In [21]:
print "mmsi spoofing_hours total_hours spoofing_Percent overlapping_segments"
for i in range(50,len(overlaps),25): # no need to start on the obvious spoofing one
    o = overlaps[i]
    map_mmsi(o[0], "Vessel "+o[0] + ", "+o[1]+" hours spoofing, "+o[3]+"% time spoofing, "+o[4]+" overlapping segs: ")
    #print o[0],o[1],o[2],o[3]+"%", o[4]


mmsi spoofing_hours total_hours spoofing_Percent overlapping_segments
Waiting on bqjob_r732547abf90f918b_00000154733eddca_7 ... (14s) Current status: DONE   
Query time: 22.7877290249 seconds.
Waiting on bqjob_r790f4f72e0bb72b2_00000154733f839c_8 ... (23s) Current status: DONE   
Query time: 29.380439043 seconds.
Waiting on bqjob_r6a96a2cbebf9bec1_000001547340420a_9 ... (23s) Current status: DONE   
Query time: 28.1823270321 seconds.
Waiting on bqjob_r3e97854668b69ddb_000001547340f09b_10 ... (23s) Current status: DONE   
Query time: 27.7723579407 seconds.
Waiting on bqjob_r390e7c4950728626_000001547341aab0_11 ... (15s) Current status: DONE   
Query time: 19.6311221123 seconds.
Waiting on bqjob_r523862ea97537b86_0000015473423f84_12 ... (15s) Current status: DONE   
Query time: 18.884816885 seconds.
Waiting on bqjob_r1fab0e4796e78907_000001547342d28a_13 ... (14s) Current status: DONE   
Query time: 18.8462719917 seconds.
Waiting on bqjob_r29028fada7639318_0000015473436690_14 ... (14s) Current status: DONE   
Query time: 18.9064240456 seconds.
Waiting on bqjob_r1fd95e2b583b7e9c_000001547343f5b3_15 ... (23s) Current status: DONE   
Query time: 29.6137051582 seconds.
Waiting on bqjob_r53730d8636f1612d_000001547344b3b2_16 ... (15s) Current status: DONE   
Query time: 20.1901669502 seconds.
Waiting on bqjob_r24a5792f455dd12f_00000154734553c8_17 ... (14s) Current status: DONE   
Query time: 19.8885960579 seconds.
Waiting on bqjob_rbdf1c7a98a4cff_000001547345e83a_18 ... (23s) Current status: DONE   
Query time: 29.4780049324 seconds.
Waiting on bqjob_r405d53fff243ab51_000001547346b07a_19 ... (14s) Current status: DONE   
Query time: 18.7648420334 seconds.
Waiting on bqjob_r3325ab7749b58c56_00000154734740f2_20 ... (14s) Current status: DONE   
Query time: 18.9457352161 seconds.
Waiting on bqjob_r1f4103c0ae1618e5_000001547347ddcc_21 ... (14s) Current status: DONE   
Query time: 18.5730450153 seconds.
Waiting on bqjob_r154e5bed2383743f_0000015473486933_22 ... (14s) Current status: DONE   
Query time: 18.8396370411 seconds.
Waiting on bqjob_r7b9626aa6fa6ce51_000001547348fb58_23 ... (23s) Current status: DONE   
Query time: 27.2438559532 seconds.
Waiting on bqjob_r6e4500808e804dee_000001547349ac5f_24 ... (14s) Current status: DONE   
Query time: 18.8771650791 seconds.
<matplotlib.figure.Figure at 0x10ccde750>

In [22]:
import csv
with open('spoofing_hours_2015.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerow(["mmsi","spoofing_hours","total_hours","spoofing_percent","overlapping_segments","num_segments"])
    writer.writerows(overlaps)