Testimony of Lindsey V. Parker, Chief Technology Officer Office of the Chief Technology Officer Before the Committee on Public Health
Text of Testimony
Good afternoon, Chairman Gray, members of the Committee on Public Health and staff – I hope you and your families are safe and healthy, despite what the last year has thrown our way. I am Lindsey Parker, the Chief Technology Officer for DC Government. I am testifying today on behalf of the Bowser Administration regarding the Microsoft vaccine portal used by the DC Government to allow the public to schedule vaccination appointments.
Last week, visitors to the portal had to deal with an onslaught of error messages and long wait times as they tried to navigate the site and book a vaccine appointment. Understandably, this only served to amplify the stress and frustration most of us are feeling during this incredibly challenging moment as we look to recover and reopen. And for that I am very sorry.
Despite these issues, last week, all appointments were able to be booked on the portal within minutes of the issues being resolved, and the people who showed up for their scheduled vaccination appointments this week were able to receive the vaccine. In keeping with what we’ve seen with the portal, since launching in December.
vaccine distribution program, including the portal, will continue to be tweaked and improved along the way as we learn more about vaccine availability and demand.
But the bottomline is that DC needs more vaccine.
Today, I’m offering testimony in regards to how the portal came to be, what went wrong last week and the direction we are taking moving forward.
How We Got Here
At the onset of DC’s public health emergency, DC Health recognized the need to stand up a technology solution that would allow residents to schedule Covid tests at sites across the District. DC Health asked the team at the Office of the Chief Technology Officer (or OCTO) to help find a solution. At that time, due to the many application development requests we were receiving from agencies to help stand up in house developed solutions, I recognized that OCTO had two choices:
- Option 1: Build a mass testing solution in house; and, as a result, redirect the resources of our development team away from more than 20 agency development asks that were in queue.
- Option 2: Find a third party solution that could securely and reliably meet the scale and custom development needs to fit the ever changing demands of this pandemic response; and, as a result, require fewer resources from our in house development team.
After researching possible solutions, including discussing with our networks of state and local technologists and public health experts, we narrowed down the options. In the end, Microsoft Consulting Services offered the only solution that was able to accommodate our needs in March 2020. Microsoft, DC Health and OCTO set up a regular project cadence so as to inform Microsoft’s team of what was needed for the design, build, release and continuous improvements to the mass testing application. The mass testing solution launched publicly on March 25, 2020. The teams worked well together – and, as of earlier this week, we know that almost 350,000 tests have been scheduled through the application.
In September 2020, DC Health approached OCTO again for help in determining a technology solution for managing vaccination distribution. We evaluated a number of solutions, most were conceptual at best and required significant work before being ready for the public. DC Health and OCTO’s established work cadence and familiarity with the Microsoft platform and team ultimately factored into the selection of the Microsoft vaccine portal solution, as well as our familiarity with scheduling functionality in the public testing solution.
The DC vaccine portal, as developed and hosted by Microsoft, was launched on December 20, 2020. We’ve released 16 versions of the portal since launching. More than 45,000 appointments have been booked through the portal. Some of the enhancements to the site over the past few months, include:
- Enhanced vaccination site management: the website will automatically hide vaccination sites where all appointments are booked; previously, this was manually performed and would lead to the public seeing sites without available appointments
- Streamlined process: the removal of questions regarding insurance information will further streamline the process of booking an appointment
- Site navigation improvements: updated help text and easier to understand buttons will make the website easier to navigate
- Additional confirmation options: users will now have the option to print a confirmation page in lieu of showing an email
The meeting cadence and structured roles and responsibilities of the various teams involved has allowed for updates and continuous improvements to the portal each week since launching:
- DC Health determines eligibility criteria, process flow and data collection needs.
- OCTO helps translate those needs to Microsoft, while ensuring that our email and contact center platforms are ready for whatever is needed to support.
- Microsoft is responsible for code changes, code fixes, testing, managing the infrastructure on which the system sits and releasing DC Government-approved versions publicly.
In late January, as eligibility criteria started to expand to the 65+ community, we recognized that there was an infrastructure capacity issue. In a matter of seconds, multiple users were vying for the same few numbers of appointments. In order to handle the drag of that simultaneous clicking down to the microsecond - so as to determine which of the 100 users actually gets the appointment and the rest have to go looking for a new appointment, the site needed faster and more redundant infrastructure to make sure it didn’t slow down. Microsoft indicated that it increased the infrastructure availability for the DC portal. Additional improvements to the user interface of the portal were made to accomodate that expanding user pool. That said, I asked to meet with Microsoft to discuss further improvements needed to the site to improve the user experience. On February 9th, I shared with Microsoft executives a few of our key concerns, including the usability of the site, the elasticity of the infrastructure, whether functional testing was robust enough, and whether further people support from Microsoft was needed to make the project successful.
The basis of my concerns centered around a notion that I brought up during my performance hearing last week. Tech solutions for local government uses are more difficult to stand up than typical consumer technology tools we are accustomed to using in our daily lives. For instance, Amazon only has to think about catering their tech solutions to a subset of users who make a decision to buy products from them. In local government, we have to think about every resident, every business and every visitor, and ensure that our solutions are accessible, usable and secure to each and every one. Major technology vendors often don’t think about these subtle differences between the tech savviness of a paying consumer versus everyone else that a city must service.
After my conversation with Microsoft in early February, we recognized a need to be more hands on with project managing the portal changes necessary to accommodate the larger audience of users on the site. Our application development team took on a larger role in determining priorities needed in future releases. We also decided that OCTO needed to be involved in functional testing of the portal, not just user acceptance testing.
Last Week
Unfortunately, we didn’t sound the alarms with Microsoft early enough to handle the massive uptick in usage we experienced last week. As a result, we saw three days of portal openings that resulted in extreme frustration to users and served to dismantle public trust in the vaccine portal. For that I am incredibly sorry.
In expanding eligibility criteria to 18 to 64 year olds with a list of medical conditions, we went from peaks of 62 concurrent users racing to book an average of 3,000 appointments at any given time on the portal to upwards of 8,780 concurrent users on Thursday, February 25, 20, 202 concurrent users on Friday, February 26, and 11,247 concurrent users on Saturday, February 27 with no equivalent increase in available vaccine appointments. Thus, creating an even more extreme bottleneck as users moved through the portal’s workflow.
In previous weeks, we might have seen a total of 100,000 system requests during the 9am - 10am hour, as users click their way through the application and requests are sent to servers. Last Thursday, we saw 1.32 million system requests in that same window, 7.38 million system requests on Friday, and 3.8 million on Saturday.
That +6500% jump in system requests on Thursday resulted in users seeing more than 1,000 error messages. The Microsoft engineering team wasn’t expecting such a steep jump in traffic and as a result the platform’s service protection limits activated while the system worked to determine if the traffic was indeed legitimate. This caused the site to be very slow for almost thirty minutes. DC Government would later learn that the portal tier of the site had been supported by a range of 15 to 25 servers – as had been the case in previous weeks with plenty of availabilty. On Thursday, 24 servers of the 25 available were used to support the portal. The web tier of the site was supported by an additional 28 servers. Once traffic began moving again, it was determined that a needed workflow was missing – and 18 to 64 year olds with medical conditions weren’t able to book vaccine appointments. Microsoft’s developers corrected the workflow and the eligibility criteria in the portal was updated by 9:51am. On Thursday, 4,468 appointments were booked before 9:54am.
Having been told by Microsoft on Thursday that infrastructure capacity had been increased, we moved forward with the Friday portal launch. On Friday, we saw a 130% increase in concurrent users, a 459% increase in system requests made to move through the portal. Those Friday users experienced 990,000 errors out of 7.38 million system requests to get through the system. Of the new 30 server maximum, the system used 25 servers. Given this additional spike in system requests, service protection limits were activated, which essentially throttled the number of requests users could make in the portal in a given amount of time without seeing an error message. The result of the throttling made the system very slow for about 15 minutes, after which traffic returned to normal speeds. Service protection limits are an industry best practice – especially when servers are being provided to a customer as a service – in order to prevent, for instance, a bad guy from automating a request that overwhelms the server or unnecessary and costly over usage of a server. That said, it makes for a frustrating user experience in an already challenging process. On Friday, 4,617 appointments were booked before 9:32am.
After being told by Microsoft that server capacity was going to increase in time for Saturday’s launch to make up for the fact that 18 to 64 year olds with medical conditions in priority zip codes had under 3 minutes to book on Thursday before appointments ran out, the portal froze for 15 minutes. The server capacity was increased by 30%. Upon investigation over the weekend, Microsoft found that when their development team cleared the application cache, the rebuild of that cache near or during the start of opening the portal further slowed the site’s responsiveness. In what could be referred to as a perfect storm, high numbers of concurrent users, those same users furiously working through a challenging bottleneck workflow that draws a lot of compute power, and the clearing of the application cache during peak load on the system the portal froze and Microsoft had to restart the system to restore normal traffic flow. On Saturday, 3,437 appointments were booked before 9:41am.
Moving Forward
In order to help flatten traffic on the portal so as to prevent frustrating error messages, sluggish processes and any further disruption to quick vaccine appointment booking, Microsoft has worked since Saturday on a number of improvements for today and tomorrow’s portal launches, including
- Increased server availability for both the portal and web tiers, two and ten times the amount available last week, respectively
- More responsive portal, given some efficiencies in the process and design of the site
- More user focused messages about what happened to the user when they don’t meet the eligibility requirements, when no more appointments are available
- Removal of the difficult CAPTCHA message
Additionally, this morning, users would have seen a less jarring message than a typical error message, letting them know that 3,000 users are being allowed into the portal on a regular cadence to help stabilize the traffic. The message also instructed users to stay on the page and keep their browser open to see if they are able to get into the portal before appointments are booked for the day. We know thousands of people saw that message and never saw the second page of the portal. While not perfect, we felt that this was a less frustrating user experience than rushing through a site just to get an error message for the next two days.
Shortly before go live, this morning, we had more than 24,000 users waiting to enter the vaccine portal. Once we went live and activated the button, it took just 6 minutes and 48 seconds to book 4,622 appointments on the online portal this morning.
Our close monitoring of social media, community chat rooms, our own community networks, and more traditional feedback loops like email and phone, revealed no major issues, no unexpected error messages and no negative feedback on the user experience. However, the truth remains that there were thousands of happy residents this morning, but exponentially more still disappointed due to a lack of additional vaccination appointments.
The plan is to help further improve the user experience with the portal by helping better predict traffic coming to site going forward by switching over to a pre-registration system for making vaccination appointments. Under the new system, individuals will be able to provide their information to DC Health through a pre-registration website or by calling the call center. As appointments are made available, individuals who have pre-registered will receive an email, phone call, and/or text message alerting them that they have an opportunity to make a vaccination appointment.
Mr. Chairman, many of your colleagues have requested a brief demo of how the pre-registration site will work and we wanted to walk you through the process, today.
[Demo Pre-registration Process]
Despite implementing a process that will elimimate some of the frantic rush we’ve seen on Thursdays and Fridays to book vaccination appointments, we still have a supply and demand problem. And the real solution to our problem is that DC needs more vaccine.
This concludes my presentation. I can address your questions at this time.