Web scraping using Azure Synapse notebooks



A great business analysts are never saturated with data, they need more and more. Data analysis frequently calls for data enrichment, to enhance the dataset we already have by adding extra information. Data enrichment makes our data more useful and helps to get deeper insights. Sometimes this data may be internal company data, but in other situations we search for third party data over the internet such as historic sales data, market research data, product data, pricing data, etc.


It's great when we find the data available in a download format, however, from time to time we may need to scrape data directly from an HTML website. The technique of importing information from a website is called web scraping or data scraping.


Data Engineers have a lot of tools that can help with querying web data, such as Chrome plugins and Excel Power Query, but my favorite data manipulation tool is Python.


Read my blogpost on mssqltips.com to learn how to use Python and Azure Synapse notebook to scrape an HTML table ( <table> tag in HTML code ) from a web page directly into the DataFrame.


Have fun


Yours

Maria

0 comments

STAY IN TOUCH

Get New posts delivered straight to your inbox

Thank you for subscribing!