Advanced: Using the browser's developer tools to find structured data

This article comes from the Silk Data Handbook. Once you ended up with a suitable spreadsheet, you can import the spreadsheet into your Silk site.

Elements tab

Viewing a page’s HTML structure can be a great help in retrieving the information you need. Sometimes the information can be directly copy-pasted into a spreadsheet. Other times you might need to parse it, or automate the data harvesting process for multiple urls. You can build a more personalized scraper with libraries like Beautiful Soup, and tools like ScraperWiki

Sources tab

In the “Source" tab you can find all the details behind how a webpage is built. This can help you monitor which scripts pull the data and from where. Once you have this information, you can be lucky enough to actually find the url that directs you to some clean structured data. An example? This interactive visualizationon NFL concussion counts appears pretty hard to scrape. 

But if you click in “Sources" and analyze the code, you’ll find this interesting piece of information.

Can you guess where the link sends you to? 

www.pbs.org/wgbh/pages/frontline/js/data/concussions/automated_newer.json 

Network tab

Sometimes parts of a webpage, for example interactive maps, are populated with data retrieved through API calls. A good alternative to scrape this data is to capture this flow of information by clicking on the “Network" tab. Here you’ll be able to monitor the network operations executed by the script used to run the webpage. Sorting for bigger sizes and specific types of operation usually helps to find the script which return the data you need. You then copy paste the results previewed in the “response" tab on the right into a text editor, and you have a file with the results of the API call used to retrieve the data you wanted.