Decoding Filing Periods

The raw data tables mix together filings from different reporting periods (e.g. quarterlys vs. semi-annual vs. pre-elections). But we need these filings to be sorted (or at least sortable) so that or users, for example, can compare the performance of two candidates in the same reporting period.

There are two vectors at play here:

  1. The "Statement Type", as described in CAL-ACCESS parlance, which indicates the length of time covered by the filing and how close it was filed to the election.
  2. The actual time interval the filing covers, denoted by a start date and an end date.

This notebook is pulling data from the downloads-website's dev database, which was last updated on...


In [2]:
from calaccess_processed.models.tracking import ProcessedDataVersion

In [3]:
ProcessedDataVersion.objects.latest()


Out[3]:
<ProcessedDataVersion: 2016-09-26 11:20:39+00:00>

Will also need to execute some raw SQL, so I'll import a helper function in order to make the results more readable:


In [4]:
from project import sql_to_agate

Let's start by examining the distinct values of the statement type on CVR_CAMPAIGN_DISCLOSURE_CD. And let's narrow the scope to only the Form 460 filings.


In [5]:
sql_to_agate(
    """
    SELECT UPPER("STMT_TYPE"), COUNT(*)
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD"
    WHERE "FORM_TYPE" = 'F460'
    GROUP BY 1
    ORDER BY COUNT(*) DESC;
    """
).print_table()


|--------+---------|
|  upper |  count  |
|--------+---------|
|  SA    | 62,439  |
|  PE    | 48,984  |
|  QT    |  4,208  |
|  TS    |  4,152  |
|  SY    |  1,829  |
|  SE    |    696  |
|  **    |     96  |
|        |     56  |
|  PR    |      5  |
|  S2    |      4  |
|  S1    |      3  |
|  QS    |      2  |
|  YE    |      1  |
|  S     |      1  |
|--------+---------|

Not all of these values are defined, as previously noted in our docs:

  • PR might be pre-election
  • QS is pro probably quarterly statement
  • YE might be...I don't know "Year-end"?
  • S is probably semi-annual

Maybe come back later and look at the actual filings. There aren't that many.

There's another similar-named column on FILER_FILINGS_CD, but this seems to be a completely different thing:


In [6]:
sql_to_agate(
    """
    SELECT FF."STMNT_TYPE", LU."CODE_DESC", COUNT(*)
    FROM "FILER_FILINGS_CD" FF
    JOIN "LOOKUP_CODES_CD" LU
    ON FF."STMNT_TYPE" = LU."CODE_ID"
    AND LU."CODE_TYPE" = 10000
    GROUP BY 1, 2;
    """
).print_table()


|-------------+----------------------+------------|
|  STMNT_TYPE | CODE_DESC            |     count  |
|-------------+----------------------+------------|
|      10,004 | REDESIGNATE THE A... |       335  |
|      10,006 | LOG/AMENDMENT        |    97,221  |
|      10,007 | AS FILED BY COMMI... |     1,281  |
|      10,002 | AMENDMENT            |   108,166  |
|      10,003 | TERMINATION          |    57,391  |
|      10,005 | LOG                  |   459,319  |
|      10,001 | ORIGINAL/INITIAL     | 1,191,192  |
|-------------+----------------------+------------|

One of the tables that caught my eye is FILING_PERIOD_CD, which appears to have a row for each quarterly filing period:


In [7]:
sql_to_agate(
    """
    SELECT *
    FROM "FILING_PERIOD_CD"
    """
).print_table()


|----------+-----------+------------+------------+-------------+--------------+------|
|       id | PERIOD_ID | START_DATE |   END_DATE | PERIOD_TYPE | PER_GRP_TYPE | ...  |
|----------+-----------+------------+------------+-------------+--------------+------|
|  124,529 |       889 | 2182-01-01 | 2182-03-31 |       1,500 |        1,500 | ...  |
|  124,530 |       890 | 2182-04-01 | 2182-06-30 |       1,500 |        1,500 | ...  |
|  124,531 |       891 | 2182-07-01 | 2182-09-30 |       1,500 |        1,500 | ...  |
|  124,532 |       826 | 2166-04-01 | 2166-06-30 |       1,500 |        1,500 | ...  |
|  124,533 |       827 | 2166-07-01 | 2166-09-30 |       1,500 |        1,500 | ...  |
|  124,534 |       828 | 2166-10-01 | 2166-12-31 |       1,500 |        1,500 | ...  |
|  124,535 |       829 | 2167-01-01 | 2167-03-31 |       1,500 |        1,500 | ...  |
|  124,536 |       830 | 2167-04-01 | 2167-06-30 |       1,500 |        1,500 | ...  |
|  124,537 |       831 | 2167-07-01 | 2167-09-30 |       1,500 |        1,500 | ...  |
|  124,538 |       832 | 2167-10-01 | 2167-12-31 |       1,500 |        1,500 | ...  |
|  124,539 |       833 | 2168-01-01 | 2168-03-31 |       1,500 |        1,500 | ...  |
|  124,540 |       834 | 2168-04-01 | 2168-06-30 |       1,500 |        1,500 | ...  |
|  124,541 |       835 | 2168-07-01 | 2168-09-30 |       1,500 |        1,500 | ...  |
|  124,542 |       836 | 2168-10-01 | 2168-12-31 |       1,500 |        1,500 | ...  |
|  124,543 |       837 | 2169-01-01 | 2169-03-31 |       1,500 |        1,500 | ...  |
|  124,544 |       838 | 2169-04-01 | 2169-06-30 |       1,500 |        1,500 | ...  |
|  124,545 |       839 | 2169-07-01 | 2169-09-30 |       1,500 |        1,500 | ...  |
|  124,546 |       840 | 2169-10-01 | 2169-12-31 |       1,500 |        1,500 | ...  |
|  124,547 |       841 | 2170-01-01 | 2170-03-31 |       1,500 |        1,500 | ...  |
|  124,548 |       842 | 2170-04-01 | 2170-06-30 |       1,500 |        1,500 | ...  |
|      ... |       ... |        ... |        ... |         ... |          ... | ...  |
|----------+-----------+------------+------------+-------------+--------------+------|

Every period is described as a quarter, and the records are equally divided among them:


In [8]:
sql_to_agate(
    """
    SELECT "PERIOD_DESC", COUNT(*)
    FROM "FILING_PERIOD_CD"
    GROUP BY 1;
    """
).print_table()


|--------------+--------|
|  PERIOD_DESC | count  |
|--------------+--------|
|  QUARTER 3   |   362  |
|  QUARTER 4   |   362  |
|  QUARTER 1   |   362  |
|  QUARTER 2   |   362  |
|--------------+--------|

The difference between every START_DATE and END_DATE is actually a three-month interval:


In [9]:
sql_to_agate(
    """
    SELECT "END_DATE" - "START_DATE" AS duration, COUNT(*)
    FROM "FILING_PERIOD_CD"
    GROUP BY 1;
    """
).print_table()


|-----------+--------|
|  duration | count  |
|-----------+--------|
|        89 |   275  |
|        91 |   724  |
|        90 |   449  |
|-----------+--------|

And they have covered every year between 1973 and 2334 (how optimistic!):


In [10]:
sql_to_agate(
    """
    SELECT DATE_PART('year', "START_DATE")::int as year, COUNT(*)
    FROM "FILING_PERIOD_CD"
    GROUP BY 1
    ORDER BY 1 DESC;
    """
).print_table()


|--------+--------|
|   year | count  |
|--------+--------|
|  2,334 |     4  |
|  2,333 |     4  |
|  2,332 |     4  |
|  2,331 |     4  |
|  2,330 |     4  |
|  2,329 |     4  |
|  2,328 |     4  |
|  2,327 |     4  |
|  2,326 |     4  |
|  2,325 |     4  |
|  2,324 |     4  |
|  2,323 |     4  |
|  2,322 |     4  |
|  2,321 |     4  |
|  2,320 |     4  |
|  2,319 |     4  |
|  2,318 |     4  |
|  2,317 |     4  |
|  2,316 |     4  |
|  2,315 |     4  |
|    ... |   ...  |
|--------+--------|

Filings are linked to filing periods via FILER_FILINGS_CD.PERIOD_ID. While that column is not always populated, it is if you limit your results to just the Form 460 filings:


In [11]:
sql_to_agate(
    """
    SELECT ff."PERIOD_ID", fp."START_DATE", fp."END_DATE", fp."PERIOD_DESC", COUNT(*)
    FROM "FILER_FILINGS_CD" ff
    JOIN "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    ON ff."FILING_ID" = cvr."FILING_ID"
    AND ff."FILING_SEQUENCE" = cvr."AMEND_ID"
    AND cvr."FORM_TYPE" = 'F460'
    JOIN "FILING_PERIOD_CD" fp
    ON ff."PERIOD_ID" = fp."PERIOD_ID"
    GROUP BY 1, 2, 3, 4
    ORDER BY fp."START_DATE" DESC;
    """
).print_table()


|------------+------------+------------+-------------+--------|
|  PERIOD_ID | START_DATE |   END_DATE | PERIOD_DESC | count  |
|------------+------------+------------+-------------+--------|
|        450 | 2072-04-01 | 2072-06-30 | QUARTER 2   |     1  |
|        227 | 2016-07-01 | 2016-09-30 | QUARTER 3   |    53  |
|        226 | 2016-04-01 | 2016-06-30 | QUARTER 2   | 2,612  |
|        225 | 2016-01-01 | 2016-03-31 | QUARTER 1   | 2,786  |
|        224 | 2015-10-01 | 2015-12-31 | QUARTER 4   |   282  |
|        223 | 2015-07-01 | 2015-09-30 | QUARTER 3   | 2,330  |
|        222 | 2015-04-01 | 2015-06-30 | QUARTER 2   |   409  |
|        221 | 2015-01-01 | 2015-03-31 | QUARTER 1   | 2,780  |
|        220 | 2014-10-01 | 2014-12-31 | QUARTER 4   | 2,934  |
|        219 | 2014-07-01 | 2014-09-30 | QUARTER 3   | 2,676  |
|        218 | 2014-04-01 | 2014-06-30 | QUARTER 2   | 1,563  |
|        217 | 2014-01-01 | 2014-03-31 | QUARTER 1   | 4,522  |
|        216 | 2013-10-01 | 2013-12-31 | QUARTER 4   |   452  |
|        215 | 2013-07-01 | 2013-09-30 | QUARTER 3   | 2,595  |
|        214 | 2013-04-01 | 2013-06-30 | QUARTER 2   |   488  |
|        213 | 2013-01-01 | 2013-03-31 | QUARTER 1   | 2,881  |
|        212 | 2012-10-01 | 2012-12-31 | QUARTER 4   | 3,181  |
|        211 | 2012-07-01 | 2012-09-30 | QUARTER 3   | 2,710  |
|        210 | 2012-04-01 | 2012-06-30 | QUARTER 2   | 1,537  |
|        209 | 2012-01-01 | 2012-03-31 | QUARTER 1   | 4,379  |
|        ... |        ... |        ... | ...         |   ...  |
|------------+------------+------------+-------------+--------|

Also, is Schwarzenegger running this cycle? Who else could be filing from so far into the future?

AAANNNNYYYway...Also need to check to make sure the join between FILER_FILINGS_CD and CVR_CAMPAIGN_DISCLOSURE_CD isn't filtering out too many filings:


In [12]:
sql_to_agate(
    """
    SELECT cvr."FILING_ID", cvr."FORM_TYPE", cvr."FILER_NAML"
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    LEFT JOIN  "FILER_FILINGS_CD" ff
    ON cvr."FILING_ID" = ff."FILING_ID"
    AND cvr."AMEND_ID" = ff."FILING_SEQUENCE" 
    WHERE cvr."FORM_TYPE" = 'F460'
    AND (ff."FILING_ID" IS NULL OR ff."FILING_SEQUENCE" IS NULL)
    ORDER BY cvr."FILING_ID";
    """
).print_table(max_column_width=60)


|------------+-----------+---------------------------------------------------------------|
|  FILING_ID | FORM_TYPE | FILER_NAML                                                    |
|------------+-----------+---------------------------------------------------------------|
|    591,533 | F460      | Damian Jones for Assembly                                     |
|    602,619 | F460      | United Teachers Los Angeles-Political Action Council of E...  |
|    670,063 | F460      | Carl Washington For CA St Assembly 52nd District              |
|    786,716 | F460      | Ernst & Young LLP - Los Angeles Political Action Committee    |
|    983,030 | F460      | John Doe for Senate                                           |
|  1,643,669 | F460      | TEST EFS 2012 CAMPAIGN 460                                    |
|  1,643,696 | F460      | TEST EFS 2012 460 CTL CMTT                                    |
|------------+-----------+---------------------------------------------------------------|

So only a handful, mostly local campaigns or just nonsense test data.

So another important thing to check is how well these the dates from the filing period look-up records line up with the dates on the Form 460 filing records. It would be bad if the CVR_CAMPAIGN_DISCLOSURE_CD.FROM_DATE were before FILING_PERIOD_CD.START_DATE or if CVR_CAMPAIGN_DISCLOSURE_CD.THRU_DATE were after FILING_PERIOD_CD.END_DATE.


In [13]:
sql_to_agate(
    """
    SELECT 
        CASE 
            WHEN cvr."FROM_DATE" < fp."START_DATE" THEN 'filing from_date before period start_date'
            WHEN cvr."THRU_DATE" > fp."END_DATE" THEN 'filing thru_date after period end_date'
            ELSE 'okay'
        END as test,
        COUNT(*) 
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    JOIN "FILER_FILINGS_CD" ff
    ON cvr."FILING_ID" = ff."FILING_ID"
    AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
    JOIN "FILING_PERIOD_CD" fp
    ON ff."PERIOD_ID" = fp."PERIOD_ID"
    WHERE cvr."FORM_TYPE" = 'F460'
    GROUP BY 1;
    """
).print_table(max_column_width=60)


|--------------------------------------------+---------|
|  test                                      |  count  |
|--------------------------------------------+---------|
|  filing from_date before period start_date |     12  |
|  filing thru_date after period end_date    | 54,769  |
|  okay                                      | 67,688  |
|--------------------------------------------+---------|

So half of the time, the THRU_DATE on the filing is later than the FROM_DATE on the filing period. How big of a difference can exist between these two dates?


In [14]:
sql_to_agate(
    """
    SELECT 
            cvr."THRU_DATE" - fp."END_DATE" as date_diff,
            COUNT(*) 
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    JOIN "FILER_FILINGS_CD" ff
    ON cvr."FILING_ID" = ff."FILING_ID"
    AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
    JOIN "FILING_PERIOD_CD" fp
    ON ff."PERIOD_ID" = fp."PERIOD_ID"
    WHERE cvr."FORM_TYPE" = 'F460'
    AND cvr."THRU_DATE" > fp."END_DATE"
    GROUP BY 1
    ORDER BY COUNT(*) DESC;
    """
).print_table(max_column_width=60)


|------------+---------|
|  date_diff |  count  |
|------------+---------|
|         91 | 23,057  |
|         92 | 16,499  |
|         47 |  3,029  |
|         52 |  1,722  |
|         49 |  1,657  |
|        275 |  1,605  |
|         50 |  1,586  |
|         23 |  1,509  |
|         22 |    394  |
|        183 |    343  |
|         18 |    301  |
|         16 |    260  |
|         20 |    205  |
|         51 |    186  |
|         21 |    173  |
|         19 |    166  |
|          4 |    137  |
|         17 |    113  |
|         32 |     96  |
|          6 |     92  |
|        ... |    ...  |
|------------+---------|

Ugh. Looks like, in most of the problem cases, the from date can be a whole quarter later than the end date of the filing period. Let's take a closer look at these...


In [15]:
sql_to_agate(
    """
    SELECT 
            cvr."FILING_ID",
            cvr."AMEND_ID",
            cvr."FROM_DATE",
            cvr."THRU_DATE",
            fp."START_DATE",
            fp."END_DATE"
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    JOIN "FILER_FILINGS_CD" ff
    ON cvr."FILING_ID" = ff."FILING_ID"
    AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
    JOIN "FILING_PERIOD_CD" fp
    ON ff."PERIOD_ID" = fp."PERIOD_ID"
    WHERE cvr."FORM_TYPE" = 'F460'
    AND 90 < cvr."THRU_DATE" - fp."END_DATE" 
    AND cvr."THRU_DATE" - fp."END_DATE" < 93
    ORDER BY cvr."THRU_DATE" DESC;
    """
).print_table(max_column_width=60)


|------------+----------+------------+------------+------------+-------------|
|  FILING_ID | AMEND_ID |  FROM_DATE |  THRU_DATE | START_DATE |   END_DATE  |
|------------+----------+------------+------------+------------+-------------|
|  2,025,877 |        1 | 2016-07-01 | 2016-12-31 | 2016-07-01 | 2016-09-30  |
|  2,025,893 |        0 | 2016-07-01 | 2016-12-31 | 2016-07-01 | 2016-09-30  |
|  2,025,877 |        0 | 2016-07-01 | 2016-12-31 | 2016-07-01 | 2016-09-30  |
|  2,002,159 |        0 | 2016-07-01 | 2016-12-31 | 2016-07-01 | 2016-09-30  |
|  2,002,159 |        1 | 2016-07-01 | 2016-12-31 | 2016-07-01 | 2016-09-30  |
|  2,076,271 |        0 | 2016-01-01 | 2016-07-01 | 2016-01-01 | 2016-03-31  |
|  2,063,670 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,664 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,657 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,650 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,624 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,622 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,584 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,575 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,574 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,571 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,566 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,537 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,444 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|  2,063,428 |        0 | 2016-01-01 | 2016-06-30 | 2016-01-01 | 2016-03-31  |
|        ... |      ... |        ... |        ... |        ... |        ...  |
|------------+----------+------------+------------+------------+-------------|

So, actually, this sort of makes sense: Quarterly filings are for three month intervals, while the semi-annual filings are for six month intervals. And FILING_PERIOD_CD only has records for three month intervals. Let's test this theory by getting the distinct CVR_CAMPAIGN_DISCLOSURE_CD.STMT_TYPE values from these records:


In [16]:
sql_to_agate(
    """
    SELECT UPPER(cvr."STMT_TYPE"), COUNT(*)
    FROM "CVR_CAMPAIGN_DISCLOSURE_CD" cvr
    JOIN "FILER_FILINGS_CD" ff
    ON cvr."FILING_ID" = ff."FILING_ID"
    AND cvr."AMEND_ID" = ff."FILING_SEQUENCE"
    JOIN "FILING_PERIOD_CD" fp
    ON ff."PERIOD_ID" = fp."PERIOD_ID"
    WHERE cvr."FORM_TYPE" = 'F460'
    AND 90 < cvr."THRU_DATE" - fp."END_DATE" 
    AND cvr."THRU_DATE" - fp."END_DATE" < 93
    GROUP BY 1
    ORDER BY COUNT(*) DESC;
    """
).print_table(max_column_width=60)


|--------+---------|
|  upper |  count  |
|--------+---------|
|  SA    | 37,523  |
|  TS    |  1,393  |
|  PE    |    502  |
|  QT    |     85  |
|  **    |     19  |
|  SY    |     16  |
|        |      6  |
|  SE    |      4  |
|  S2    |      4  |
|  S1    |      3  |
|  S     |      1  |
|--------+---------|

At least this is mostly true.


In [ ]: