Advanced: Using the browser's developer tools to find structured data
Viewing a page’s HTML structure can be a great help in retrieving the information you need. Sometimes the information can be directly copy-pasted into a spreadsheet. Other times you might need to parse it, or automate the data harvesting process for multiple urls. You can build a more personalized scraper with libraries like Beautiful Soup, and tools like ScraperWiki.
In the “Source" tab you can find all the details behind how a webpage is built. This can help you monitor which scripts pull the data and from where. Once you have this information, you can be lucky enough to actually find the url that directs you to some clean structured data. An example? This interactive visualizationon NFL concussion counts appears pretty hard to scrape.
But if you click in “Sources" and analyze the code, you’ll find this interesting piece of information.
Can you guess where the link sends you to?
Sometimes parts of a webpage, for example interactive maps, are populated with data retrieved through API calls. A good alternative to scrape this data is to capture this flow of information by clicking on the “Network" tab. Here you’ll be able to monitor the network operations executed by the script used to run the webpage. Sorting for bigger sizes and specific types of operation usually helps to find the script which return the data you need. You then copy paste the results previewed in the “response" tab on the right into a text editor, and you have a file with the results of the API call used to retrieve the data you wanted.