I am currently in the process of acquiring my US citizenship. It's a long process without a set schedule. I found myself frequently visiting the USCIS status page to see if any update had been posted. After a few iterations I grew tired and rather than waste time going to the website, enter my case number, navigate to the status page, and check if there were any changes, I decided to automate these steps with a python script.
In this post I detail the steps of the script, which does the status check for me and shoots me an e-mail when detecting a change. I then show how I made it into a cron job so that it would run automatically on a schedule of my choosing.
Here I leverage SELENIUM, BEAUTIFULSOUP, and SMTPLIB and PYTHON-CRONTAB.
Note that building my automaton requires scouting the pages it needs to navigate to identify the specific elements it needs to interact with; in this case only a handful.
In [1]:
from selenium import webdriver
import os
from bs4 import BeautifulSoup as Soup
from shutil import which
import sys, IPython, platform
print(f'Setup: {platform.machine()} running {" ".join(platform.linux_distribution()[:2])}')
print(f'Python: {sys.version[:5]}\nIPython: {IPython.__version__}')
The last step I went through in my process is an interview with a case officer. Thus the phrase that if altered on my status page should indicate a a change in my status is "we scheduled an interview". Let's first store that in a token...
In [2]:
token = 'we scheduled an interview'
Here, I use phantomjs to avoid the browser window. phantompath gives the location of phantomjs on my computer. If you prefer chromedriver Miguel Grinberg (the Flask guy) has a recipe for the cost of a couple extra lines.
In [3]:
phantompath = which('phantomjs')
browser = webdriver.PhantomJS(executable_path=phantompath)
Navigate to the USCIS landing page...
In [4]:
browser.get('https://egov.uscis.gov/casestatus/landing.do')
... find the input box to enter my case number, stored in the variable case_num...
In [5]:
caseNo = browser.find_element_by_id('receipt_number')
caseNo.send_keys(case_num)
... find the button to initiate the lookup and click...
In [6]:
srchButtn = browser.find_element_by_name('initCaseSearch')
srchButtn.click()
... now we're on the status page, get the page source and parse it with BeautifulSoup
In [7]:
pageSrc = browser.page_source
pageSoup = Soup(pageSrc, 'lxml')
... from inspecting the page source, I know that the text I'm looking for is under tag "p"...
In [8]:
targetSection = pageSoup.find('p')
... and is the first item yielded by the child of targetSection (a generator)...
In [9]:
targeText = list(targetSection.children)[0]
... sampling some of the text returns...
In [10]:
targeText[20:100]
Out[10]:
... well, the status still reflects the interview, so nothing happens after this and the process goes back to sleep. However, the rest of the code shows the next steps in the case of change. Namely
In [13]:
if token not in targeText:
message = f'USCIS CASE UPDATE: \n {targeText}'
try:
smtpObj = smtplib.SMTP('localhost')
smtpObj.sendmail(from_, to, message)
except SMTPException:
print("Problem sending email")
The full source code is available at the end of the page. I saved this code as CheckCaseStatus.py, and the snippets below show how I add it to my cron tasks.
First import CronTab and instantiate a cron object.
In [14]:
from crontab import CronTab
myCron = CronTab(user=True)
I'm the only user on my computer so setting user=True is sufficient. On a multi-user setup, an actual username should be specified. Now add running the CheckCaseStatus.py as a new job to the cron instance...
In [15]:
runpath = os.path.join('~/DEV/MyCronJobs/CheckCaseStatus.py')
In [16]:
job = myCron.new(command='python CheckCaseStatus.py')
... set the running schedule - check for a change every 5 days - and save...
In [17]:
job.day.every(5)
myCron.write()
In [ ]:
# CheckCaseStatus.py
from selenium import webdriver
import os
from bs4 import BeautifulSoup as Soup
import smtplib
from shutil import which
def check_uscis_page(caseNum, phantompath):
"""
opens uscis webpage in the background (phantomjs)
navigates to case status page (requires case number)
"""
browser = webdriver.PhantomJS(executable_path=phantompath)
browser.get('https://egov.uscis.gov/casestatus/landing.do')
caseNo = browser.find_element_by_id('receipt_number')
caseNo.send_keys(caseNum)
srchButtn = browser.find_element_by_name('initCaseSearch')
srchButtn.click()
pageSrc = browser.pageSource
pageSoup = Soup(pageSrc, 'lxml')
browser.close()
targetSection = pageSoup.find('p')
return list(targetSection.children)[0]
def check_update(targeText, token):
if token in targeText:
return False
else:
return True
def shoot_email(text, from_, to):
from_ = from_
to = [to]
message = f'USCIS CASE UPDATE: \n {text}'
try:
smtpObj = smtplib.SMTP('localhost')
smtpObj.sendmail(from_, to, message)
except SMTPException:
print("Problem sending email")
if __name__ == '__main__':
myEmail = os.getenv('MYGMAIL')
case = os.getenv('USCIS_CASE_NO')
phantompath = which('phantomjs')
retrievedText = check_uscis_page(case, phantompath)
if check_update(retrievedText, 'we scheduled an interview'):
shoot_email(retrievedText, myEmail, myEmail)
In [ ]:
# CronScheduler.py
from crontab import CronTab
myCron = CronTab(user=True)
job = myCron.new(command='python CheckCaseStatus.py')
job.day.every(5)
myCron.write()
Done, and all that in 50 or so lines. The last thing left is to python CronScheduler.py to get the program going.
Happy hacking!