Back when it was very difficult to get vaccine appointments for COVID-19, I created a little script that would notify users when a vaccine appointment was available near them. I circulated it around friends/family and people who I met in NY who brought up having difficult getting a vaccine. I wanted Tara and me to start a more official website to do it, but she thought there was too much reputational risk in doing so. So, I decided to just email one of the sites that was doing something similar (sans the notification part), and offer them the code + advice on how to scale a notification system like that to hundreds of thousands of people.

read more
Hi Joe, 

I'm a volunteer with NYC Vaccine List and saw your note in the inbox regarding potential notifications. It's too bad Coursicle didn't exist when I was signing up for college classes!

We're not sure yet if we're committed to a notification system but are definitely interested in hearing about your experiences as this feels like a similar problem. To understand better - do you have relationships with the colleges?

I don't think we have specific questions yet as are mostly working on keeping the back-end stable and adding more sites.That said, from a user experience standpoint, we certainly realize that if we can enable people to step away from their browser it would be a win.

Would love to hear your learnings and thoughts, 
Sharon



Hi Sharon, 

Thanks for your email. 

To understand better - do you have relationships with the colleges?

Great question: in general, no. We only work with 1 of the 900 colleges we support. In every instance except that one school, we scrape the course data (Python, Requests, BeautifulSoup) from publicly available websites. This is somewhere around 150,000 web requests to hundreds of different servers every 2.5 minutes. We started using Celery to create a distributed cluster of servers to handle that scale. Obviously, you're looking at far fewer pages, so the scraping scale doesn't really need attention (even if you were to expand this beyond NYC).

We're not sure yet if we're committed to a notification system but are definitely interested in hearing about your experiences as this feels like a similar problem. 

The notification scale is what likely would need attention. We started when we were in college and weren't going to pay Twilio money to send text messages for us, or even SendGrid to send emails (90% of our users wanted text notifications anyway). The cost was always prohibitive: it would have cost $1000/month to run Coursicle while we were in college if we sent texts via Twilio, even with only 10,000 users. What we figured out was that most carriers support something called the email to SMS gateway, where you can send an email to 9192592136@txt.att.net (full phone number + txt.att.net to indicate the carrier is AT&T), Verizon is the same but with @vtext.com. We simply sent these emails programmatically via a Gmail account, and when we exhausted the daily sending limit we made about 25 more Gmail accounts and round-robined through them. 

This evening I wrote a very small, hastily constructed script that does both the scraping and the notifications for my parents and my grandmother. I've attached it below for your reference so you can see how straightforward it is.

Best,

Joe Puccio
Co-founder, Coursicle
joe@coursicle.com	



#!/usr/bin/python

# for logging
from datetime import datetime
from pytz import timezone # pip install pytz

# for sending
from email.utils import make_msgid
from smtplib import SMTP_SSL, SMTP_SSL_PORT
from email.mime.multipart import MIMEMultipart, MIMEBase
from email.mime.text import MIMEText
from email.encoders import encode_base64

# basics
import requests
import time

username = 'uncsender1@gmail.com'
password = '[redacted]'
fromAddress = 'Vaccine Checker '
imapServer = 'imap.gmail.com'
smtpServer = 'smtp.gmail.com'

def log(message, file, path=None, shouldPrint=True):
	if shouldPrint: print message
	easternTime = timezone('US/Eastern')
	currentTime = datetime.now(easternTime)
	path = path if path else "logs/"
	with open(path+file, "a") as logFile:
		logFile.write("{0} - ".format(currentTime.strftime("%B %d, %Y, %I:%M:%S %p")) + message+"\n")

def sendEmail(email, toAddress):
	global smtpServer, fromAddress, username, password

	# Connect, authenticate, and send mail
	smtpConnection = SMTP_SSL(smtpServer, port=SMTP_SSL_PORT)
	# Show SMTP server interactions
	# smtpConnection.set_debuglevel(1)
	smtpConnection.login(username, password)
	smtpConnection.sendmail(fromAddress, toAddress, email.as_string())

	# disconnect
	smtpConnection.quit()

def createEmail(content, toAddress):
	
	# create the MIME object (multipurpose internet mail extension)
	email = MIMEMultipart()
	email.add_header("From", fromAddress)
	# this is a reply, so we send to the "from" address of the email we're replying to
	email.add_header("To", toAddress)

	# clean the subject if it already has "Re:", then add it
	# turns out including the "Re:" is really important for getting 
	# mail clients to quote the relevant parts of the body email
	subject = "Potential Vaccine Appointment"
	email.add_header("Subject", subject)

	# create a UUID (message ID) for this reply
	messageID = make_msgid()
	email.add_header("Message-ID", messageID)

	email.attach(MIMEText(content, "plain"))
	# optional: could add HTML (MIMEText(template, "html"))
	# optional: could add attachements like images: MIMEBase("application", "octet-stream")

	return email

def notifyUsers(users, url, change):

	for user in users: 
		content = """
		Vaccine checker has just noticed a change on {url}

		Please note that this very well may be a false positive. Something on the page could have changed that does not indicate a vaccine appointment.

		The following content is new:
		{content}
		""".format(url=url, content=change)
		email = createEmail(content, user)
		sendEmail(email, user)

	
def compareHTML(old, new):

	differences = None

	# for testing notifications, uncomment below:
	# differences = ["Joe has nice hair", "also an ok beard"]

	if old.content != new.content:
		oldContent = old.content
		newContent = new.content
		differences = []
		for line in newContent:
			if line not in oldContent:
				differences.append(line)
		
		# remove any ignorable differences
		igorableDifferences = ["\n", " ", ""]
		differences = [difference for difference in differences if difference.strip() not in igorableDifferences]
		if len(differences) == 0:
			differences = None

	return differences


def checkVaccine(url):
	
	try:
		html = requests.get(url, timeout=60)
	except:
		print "Hit a request error, just going to skip this check"
		time.sleep(10)
		return

	if url not in history:
		history[url] = html
		print "Inited html for {url}".format(url=url)
	else:
		differences = compareHTML(history[url], html)
		if differences is not None:
			difference = "\n".join(differences)
			notifyUsers(trackers[url], url, difference)
			message = "Detected change on {url}. Notified {users} of the following new content: {content}".format(url=url, users=trackers[url], content=difference)
			log(message, "changes.log")
		else:
			print "No change detected for {url}".format(url=url)




# init
history = {}

# joe
joe = ["josephpuccio@gmail.com"]

# triangle people
triangleTrackers = ["elizabeth@pooch.us", "9194231671@txt.att.net", "phil@pooch.us", "9192259013@txt.att.net"]
nyTrackers = []

# to find new options, just go to https://covid19.ncdhhs.gov/findyourspot
# or whatever state's main site which has a list of all the counties
# and go through the counties that are nearby and add them

# set who is tracking what urls
trackers = {
	"https://www.unchealthcare.org/coronavirus/vaccines/phase-1b-covid-19-vaccine/" : triangleTrackers, #UNC
	"https://app.acuityscheduling.com/schedule.php?owner=21692896&appointmentType=19824511" : triangleTrackers, #Chatham county
	"https://www.dcopublichealth.org/services/communicable-diseases/coronavirus-disease-2019/covid-19-vaccines/-fsiteid-3" : triangleTrackers, #Durham county
	"https://www.alamance-nc.com/healthdept/" : triangleTrackers, #Alamance county
	"https://www.conehealth.com/covid-19-information/covid-19-vaccine-information/" : triangleTrackers, #Cone Health (Alamance county)
	"https://nycvaccinelist.com/" : nyTrackers # dozens of NYC locations
}

# add in joe everywhere
for url, users in trackers.items():
	users.extend(joe)

# main loop
while True:

	for url in trackers: 
		checkVaccine(url)
	
	time.sleep(10)