>> Hello. Good morning or good evening, depending on where you’re all located Thank you for joining our Power BI Webinar, and my name is Deepak Shankar I’m a Community Manager within the Microsoft business application group Today, we have Indira Bandari with us who will walk you through a new Power BI features and steps Indira, it’s all yours now >> Thanks Deepak. Today’s Webinar is about Web Scraping in Power BI Web scraping can be done in a lot of ways But before I proceed, I just want to share with you just a small snippet of who I am I’m a Microsoft Data Platform MVP, and I have a Masters degree in Statistics, and I’ve been working on SQL Server since 2002 I have gone through all the changes that SQL Server has gone through, and I’m a speaker and a co-organizer, mainly speak in New Zealand and Australia In my pastime, I love teaching kids I started off teaching kids with game development in 2016, and every year, I do some technology teaching for kids For the past two years, I’ve been teaching Power BI to the kids, and it has been an awesome experience You can see my details if you want to connect with me on the left-hand side. So moving on Today’s agenda is, I want to cover it a little bit of what Web Scraping is, and what use cases will be common scenarios of Web Scraping Also, the different types of Web Scraping that can be done using Power BI So what is Web Scraping? Web Scraping is a process used to extract data from websites If we look at the common scenarios, it can be used to build a price comparison Website We can use Web Scraping to collect prices, product description, and images This data can be used to gain insights, as well as for comparison purposes Also, the next scenario, you can build a product catalog automatically The next scenario is you can use Web Scraping to do competitor analysis, so to extract the competitor data, and also customer sentiment as to how people are feeling about the product in a structured and usable format You can use Web Scraping for financial data analysis to get financial statements in a useful format, and also to analyze the same data to gain some insights The next scenario you can use is LEAD GENERATION So together contact details like e-mail ID, phone numbers for business purposes, or even individual purposes, some of the non-profit organizations can also use Web Scraping to gather this data and then generate leads It can also be used for brand monitoring and reputation So you can use Web Scraping to actively build brand intelligence, and monitor the brand preparation to understand how customers feel about a product or a service Some other scenarios are like gathering data for machine learning Machine learning requests lots of data So Web Scraping can be used to gather huge amounts of data across millions of websites Without Web Scraping, it is humanly impossible to do machine learning Some of the job boards that you see, the job listings, they also use the Web Scrapings to extract the data and display the job listings in a way that is more understandable to the users Also, I think this can be used for MAP compliance MAP compliance, you might wonder what MAP is Map is Minimum Advertised Price So it needs some compliance,

and it needs to be tracked That publicly advertised price across online stores, it can be monitored So monitoring retail channels is important for manufacturers, and this can be easily done by Web Scraping Otherwise, it is very hard to do this manually Also, the social media analysis So for example, if you want to know what people are thinking, you offer your product in Facebook, or Instagram, or Twitter, or other social media You can use Web Scraping, and then you can analyze, gain insights from the data you data gathered using Web Scraping, and get the customer sentiments, and also text analysis So similarly, the news monitoring, if you want to monitor specific content from all news channels, or even hashtags, you can do that using the Web Scraping technique Also, search engine optimization If you want to monitor the search results for your product, you can use the Web Scraping to do the search engine optimization as well You can gather the search engine results and then see whether the trend is going up or down, based on that, you can tweak your tags and things like that These are the common scenarios that I have shared with you now for Web Scraping Moving on, let’s start with the simple Web Scraping scenario, I’ll pick up this link that you are seeing This link is the population of the United States by country, so I’m just taking the United States In this scenario, we are going to use Web Connector Let’s get into the demo part I’m just going to copy this URL I think, I have already, yeah This is the URL that I have pasted, and this is the Website From this, if you look at this Website, so we’re going to scrape this table in to Power BI How do I do that? I just copy this link here, the URL, and then go into Power BI When you open Power BI, you get “Get Data”, so just go “Get Data”, and then you get the Get Data dialog box Here, you can’t see because there are a lot of connectors, you can’t see the Web Connector In order to search for the Web Connector, just go here, and then search Then once you search, you get the Web Connector, just double-click on that, and paste your URL here If you click on “Okay” It’s taking a little while It will display the divisions on the HTML page Hopefully, it’s not taking too much time You have Table 0, just you can preview Table 0 is not what we want Document is just CSS, and if you go to Table 1, so this is what we want So once you preview this one, see how this is not highlighted If you tick this one, only then it gets highlighted and then you can just load the data or if you click on “Transform Data.” If you want to do some manipulation you can click on “Transform Data” and then have a look here If you want to remove some data, you can remove “Change the datatype” you can do that, and the table will be in your Power BI So I can just say, this is ws simple, so you will get the data So I will stop here for the first scenario, which is using the Web Connector So now, I’ll just go back to the Power BI, so we have done this one Next scenario is Get data by example So again, I am going to copy this URL, and let’s see what the URL contains

So what this contains is, this has a proper way of displaying three hourly observations This is the Met service data for New Zealand, for Northland and Auckland We have some towns here; Cape Karikari, Cape Reinga, and things like that So I want to get this into Power BI So we’ll use the same method So I’ll just go into Power BI Again, go into get data and then use the Web Connector again, and then paste the URL here Once you do that, you would expect that to display the information like a table, like how we have seen in the previous scenario But you are not getting that, because the Web Connector is not that polished from Microsoft, so we need to do some modifications So what Microsoft has provided is, you can add a table using examples So that’s what I will click See here there is a small button on the navigator So I’m going to just click that, and what it does is it’s going to give me a web view and a column view So this is the web view, and you can scroll through and see what you are seeing So I want to get these ones So Cape Karikari into one column, and temperature into one column, wind speed into one column, and so on So how do we do that? So when I first started, I tried clicking here but nothing worked, so I started typing here So when I go “Cape Karikari” so this is the one, see that is the one So I’m just going to add that After adding, I want to add the temperature It is not really intuitive to have gray colored column here It took me a while to figure out how to add the next column, so you just go and click here, and then it adds a new column So now, I want to add the temperature Before I add, I just want to see what the temperature is It’s n/a, so I need to add that Add the temperature I need to look for n/a Hopefully, I can find it I think, it went up See, this is the n/a, and I want to add the wind speed, 17 kilometers I will type “wind speed” and “17 kilometers per hour.” Let’s add these three for now Because this is an example, I think, it takes a couple of records to recognize the pattern I will add now Cape Reinga, Cape Reinga at undefined See, now it recognizes the pattern and automatically creates that column If you go again, temperature n/a, and add it and now it has recognized the other If you look at the details, yeah, before that, we will add the wind speed Wind speed is n/a here, so wind speed and if you move out of it, then it recognizes If we look below, so the next one is Kaikohe, which is automatically recognized and add it, and temperature is 11 degrees, and four kilometers is the wind speed Once you say, “Okay” here, you can add remaining columns as well Now, if you click here, then you can preview that and then you can do a transform data, and maybe you can rename the columns to town and this too you can rename it as temperature and also wind speed

What you can do is, you can replace temperature, colon, space with nothing, so that you just get the temperature So replace values, temperature, colon, space and replace with nothing, and it will give you-all the replaced ones Similarly, we can do that with the wind speed So wind speed, colon, space replace all with spaces So this is in the Power Query Editor, and then you can close and apply this one So that’s the second way of doing it with “Add data by example.” Now, let’s go back to the PowerPoint and then let’s see the next scenario Next scenario is get data based on HTML and CSS So I have a blog here This is a famous blog in Power BI, by RADACAD The authors are from New Zealand, so I pick this one So if I go to the RADACAD blog See the blog is here, situation is like this I want to extract these article titles So the way I have done it is, see data one “Time Series Anomalies Detection,” “Same Period Last Year to Date.” Those are the titles I want to extract in my Power BI So if I want to do that, I will just go through this one and go into the Web Connector again Get data, web connector, and paste the URL, and then click “Okay” and just wait for it to load What this displays is the HTML or CSS which we saw in the earlier post, it was named as document So let’s see what it comes up here Well, it’s taking a while Any questions so far? >> No questions Indira, you can continue >> Okay. So here, we can see that it has an element, HTML and it’s a table So I will just click on this and click on “Transform Data” and here, you need to click on “Transform Data.” Now, you see that in the Children, you can see that there is a link, and there are two tables I will just click on this one, then see here that contains a header and a body, and the body contains again a table I’ll just expand that, and then here again there is one division which contains a table Again, when you look at this one, there are two tables here I’ll just expand that, and when you go here, there is nothing here, there is no table link so I won’t expand that,

but when you go here, you can see three tables with the header, division, and footer I will expand that, and if you look at the header, there are two tables, and if you look at DIV, there are tables, and if you look at the footer, there are two tables What I will do is I will just expand the div table to see whether the, and if I see here maybe I’ll expand this one When I expand this one, I’m just interested in the Children because that’s where the table is, and I’m going untick the “Use original column name as prefix.” Now, I can see the two tables, and here also again there are tables I’ll expand these two, and then untick these two, and I’m interested only in the Children If I look at this one, that’s the main and that’s the aside I am interested in the main table, so I’ll just expand that, and see here you can see the name of the article, and that’s what I’m interested in, and then the name as article name this thing So I’m interested in that So I’m just going to use both Name and Children So this is the article So now I can have a look at this one, and then see and expand the table, and again I will include only the Children column, and then this also I will expand, and again we’ll get the Children column So if we look at, yeah, again if we look at that one, there is a division and the header So I think each one has, that’s an image, that’s a header So I’m going to expand this one again So in the applied steps, if you do something wrong, you can just click on this thing and go back So here, I think you can do this using the source View paid source as well to find out which one and it’s going to So what I want is same period last year So I can just find that out So this is the title I want So it’s div class, article id in the article, div class and image So I think because of time constraint I’m just opening Meanwhile, we try again If we go here

Okay. Here, I have expanded the Children, again expanded another Children Within the Children again when I expanded, then the third one, then I could see the text, I think I omitted the text in the previous one In the text, we have the time series are normally and things like that After that, after getting these texts, I wanted to remove the nulls and all the other data that I don’t need What I’ve done is I have filtered the rows, so I’ve unticked all these and then just got the filter rows These are the, scrape through that, just the article titles The way I have gone wrong is I think, I should have included the text as well That’s the third one, third type of web scraping that you can do using the HTML and CSS If we look at the fourth one, this is the web scraping using Power Query Here, this is a bit complicated and this the technique that I’m going to show you has been derived by three people: Miguel, Matt, and Liam Liam wrote a blog post which I have shared in this PowerPoint at the end I’m using that blog post to show you to extract the data for using Power Query So if we look at the link that I’m going to use, this is a hockey statistics for 2017 and 2018, and this table is what I want to scrape So when we look at the URL, it is very simple here It is visible to the users So if we just use that, then what happens is I’m just going to give you an overview before I dive into the demo So when you just use that and scrape using Power Query, you will get only the first page of data which is restricted to about 50 rows, but there are 20 pages of data, and we want all the 20 pages into Power BI to analyze So in this scenario, what we need to do is we need to follow three or four steps First thing is to see what the page number is In the first scenario when you just use this it will go to the first page, and you want to go to the next page till the last page So there are two scenarios here You need to see how many pages are there and also to see how to get all pages into one table So this scenario we will go through, and because the URL is not self-explanatory with the page numbers, you need to use a tool called Fiddler to get the exact URL that uses the page variables So now I’m going to just copy this Let’s dive into the demo So this is Power Query and it does, so I just want to So “Home”. Okay. So “Get Data” and “Web” again, and we’ll copy that, let’s see what we are going to get So we get the table

Let’s look at the table This is what the table is I just click on the table and then go transform data because that’s what we want to do Now, when we transform the data, if we look at the data as mentioned earlier, we get only 50 rows, but we want all the rows that are there in the table, including all the pages To do this, first, what we need to do is we need to pass a parameter for the page number The first step is to create a new parameter, and I will just call that parameter as PageNumber, and I will start that with 1 Current value is 1, and then create a page number Now, all the steps are defined in this thing, so now, using Fiddler, this is the code, this is the URL that was detected in this that shows the page number See page and percent, page equals So we’ll try this one in the browser and see whether the data is displayed When I try this one, see, this is how it displays, and this is on Page 2 because I have parsed the page number 2 to the parameter page I’m going to use this link I’m going to modify in Power BI, in the Power query editor, go to Advanced, and I’m going to modify this URL with that First step is this URL with that, and also, parse the page number parameter To do that, again, I’ll go into Power BI To create a new parameter page number, this is the code that we need to parse to create it as a function We need to create it as a function so that the page numbers are passed If you go here, so now, if you do that, then you can see that this has changed to a number So I’m just going to call this function as, let’s say, PageData So now this is a function If I invoke this function, and let’s say I’m parsing Page 3, and invoke, it will give me Page 3 status So Page 3 starts at 101, so a 101-150 But how do we get all the pages? So in order to do that, we just need to create another query But before we do that, I just want to go through the functions that we used to create that query So if we go here, see here, I have put the all pages code So if you copy this one, of course, if you copy this one maybe into a notepad, yeah, I think I have it in the notepad So you need to remove those bullet points So here, what I’m using is a list generate function So it generates the list of the number of pages So what this is trying to do is this is page data function that we have created So if we invoke the function with Number 1 as the parameter, if it has values, it will give the values

If there is no data on Page 1, then it should give us null Otherwise, it’s going to error So that’s the logic here that is being used So I’m just going to copy this one, and go here, and then I’m going to create a new query and blank query, and I’m going to go to the Advanced Editor, and then remove what is there in that, and then I’m just going to use this query Page data, remember, if you use the function name as something different, you need to change the page data to that function name that you have created So this has created table values, and the number of rows is 20, which is indicating the number of pages as 20 If we look at the table here, you can see all the values Now, what we need to do is we can’t just expand because this is just a list, because we are generating a list So now what we need to do is just convert this list into a table, and then take the parameters as it is, don’t make any changes Now, see, there is an expand button, so unless you convert the list to a table, you can’t expand the column Now, before we expand, let’s see what we can see So this has rows, row numbers, and the names, and age, and all the statistics We can just expand, and make sure when you are expanding, unpick the use original column name as prefix If you keep that, it comes up with column1.pk, column1.name, and things like that, which we don’t want Once you expand, you get the data, and when you look at the data, see it says 956 rows have been imported Here it says 956 throws Let’s go and see how many number of rows are there, let’s say, in the last page Luckily, there are row numbers There are 956 rows, so we have imported all the 956 rows We can just go and then apply the changes, and then do the visualizations or data modeling that we want to do This is the hockey data Now, if you go back to the power query, there is a more detailed explanation in this blog post here by Liam Bastick, who is an Excel MVP, so you can refer to that Lastly, to recap of what we learned We learned what Web Scraping is, and what are some of the common scenarios that we can use Web Scraping, and the simple mode of Web Scraping Get Data by example, we have seen using the Met service data, metadata, and get data from HTML or CSS How to scrape the article names of the blog that we have seen Lastly, we have seen Web Scraping using Power Query, and there are other scenarios as well where you can scrape using R and Python, which I haven’t included I thought that I will not be able to finish demoing those to you That’s the Web Scraping in Power BI Any questions? >> Yeah, no questions Indira until now >> Yeah. I think, that’s the end of my presentation >> Okay. You’re done with the presentation? >> Yes >> Awesome. It’s a very good presentation, and it’s a very good Webinar Do you have something else to proceed on this,

if people want to connect you to the community, like a place where they can touch base with you? >> Yeah. My details are here My Twitter details and my LinkedIn account These are the details, I can share the PowerPoint >> All right. Thank you so much It’s been a very good Webinar >> Thank you. Thanks a lot for the opportunity, Deepak >> Thanks, Indira. Bye