Thank you for using our services! We want you to make the most of these shared investments by putting them to frequent use.
We make these available you. But, we also need to protect our shared investments and make sure that we are using them in ways that do not impact others’ ability to use them. Please read the below to understand whether your intended use might fall under the category of “aggressive web scraping”, and what you can do to avoid it.
What is aggressive scraping?
Over the past few years, we’ve seen a sharp increase in the proliferation of bots and scrapers. These are applications that crawl web pages and services to extract information. Many of them are harmless, and some are even beneficial (e.g., Google’s bot helps our pages show up in search results). But poorly-coded bots and scrapers can hurt the performance of the site/service they’re crawling by mimicking what is known as a Denial-of-Service attack, in which a web site is so overrun by traffic that it freezes and can’t respond to normal user interaction.
Most of the aggressive scraping we see is on the MAR web service.
Am I aggressively scraping a web service?
If your application is accessing a service or page thousands of times with no pause in between then your code can impact the operation of our web servers, as described above. We are especially concerned if the impact occurs during DC business hours (8:00 am to 6:00 pm EST) as our web services are used in many core business operations, both within District government and outside.
So what can I do to use DC web services responsibly?
In general, these bot and scraper best practices will mitigate impacts on our server:
- Use a 3 second delay between requests to allow the server to “breathe”
- Make sure your app requests and respects our robots.txt file
- Leave larger volume processing to off-peak times, or 11:00 pm to 6:00 am EST
- If you are using the MAR web service, and have a large number of records to geocode, you can send them in batches of 1000 using findLocationBatch (for xml) or findLocationBatch2 (for json).
What if I don’t adopt these practices?
To protect our shared investment, OCTO monitors these services and takes action make sure that they remain available to all. If we see you using our services in a way that meets the above description of “aggressive” then we may block your operation from completing and/or your IP address from making further calls.
What if I have further questions?
Address them to [email protected].
- Continue to Developing Applications with MAR Web Services
- Visit our Connect with Web Services page on Open Data DC
- Reasons to Use Web Services