Extracting all URLs on a Web Page with Chrome Developer Tools
Posted on April 1, 2021 in Google Chrome, JavaScript by Matt Jennings
Original Information
Thank you to Shan Eapen Koshy for positing a YouTube video on how to do this.
- In Chrome, go the website that you want to extract links from, like https://www.codeschool.com/.
- Open Chrome Developer Tools by pressing
Cmd + Opt + i
(Mac) orF12
(Windows). - Click the
Console
panel near the top of Chrome Developer Tools. - Inside the
Console
panel paste the JavaScript below and pressEnter
:var urls = document.getElementsByTagName('a'); for (url in urls) { console.log ( urls[url].href ); }
Now you will see all the links from that particular web page.
- You can also click the
Undock into a separate window
button (in the upper-right of Chrome Developer Tools and just left of theX
that you can click to close Chrome Developer Tools). This will open a separate window that only displays Chrome Developer Tools along with the extracted links.
Updated (April 1, 2021)
Someone in the comments asked how how then can return only URLs containing “abc” or “defg”. The info below contains that and other information to make your code compatible with older browsers if you want to use this JavaScript snippet in a website.
- In Chrome go to a website you want to extract links from, like https://wordpress.org/.
- Follow steps 1 through 3 in under the Original Information section above.
- Inside the
Console
panel paste the JavaScript below and pressEnter
:var urls = document.getElementsByTagName('a'); for (var i = 0; i < urls.length; i++) { console.log ( urls[i].getAttribute('href') ); }
- Or if on https://wordpress.org/ you want to find all links that contain specific text (like “showcase”) in the URL will on very older browsers (like Internet Explorer 9 and above):
var urls = document.getElementsByTagName('a'); for (var i = 0; i < urls.length; i++) { if(urls[i].getAttribute('href').indexOf('showcase') > -1) { console.log ( urls[i].getAttribute('href') ); } }
- Or if you want to use modern code that will work in the Google Chrome browser but not very old browsers (like not Internet Explorer at all), use the code below to find all links that contain specific text (like “showcase”) on https://wordpress.org/:
let urls = document.getElementsByTagName('a'); for (let i = 0; i < urls.length; i++) { if(urls[i].getAttribute('href').includes('showcase')) { console.log ( urls[i].getAttribute('href') ); } }
But when the same code is written for chrome extension it gives “undefined” as the result
I updated the code Shwetha. Thanks for reading my blog!
I love your code block in this post. Thus, I’m probably gonna steal it. 🙂
Go for it William Pate!
Hi,
This is awesome — how would I designate only return certain urls?
For example, I want to return only URLs containing “abc” or “defg”.
Hi Calvin,
See my answer under the “Updated (April 1, 2021)” heading above. That includes the information you need.
Great code, however I have a problem with it. I want to list all e-mail addresses from a website, but after replacing “showcase” to “mailto:” I’m getting an error: “Uncaught TypeError: Cannot read property ‘includes’ of null
at :3:34”. Is there a way to make it work?
OK, YouTube comment section under the original video solved it for me 😉 Below code works just fine:
filteredString = ‘mailto:’;
urls = $$(‘a’); for (url in urls) if (urls[url].href.toLowerCase().includes(filteredString)) console.log ( urls[url].href );
I assume you are using jQuery Sebastian when you rewrote:
urls = $$(‘a’);
You will just need to remove one “$” character in the line above so it looks like:
urls = $(‘a’);
Hi Matt,
THanks for this code, I am trying to extract all the requests, like document request, xhr requests, Resource requests, and also like start time, complete time and load time for these requests.
Please let me know, if I can achieve this.
Praveen T
Hi Praveen,
Unfortunately I don’t know how to do this. Good luck with a Google search on how to do this.
is there is a way to change the URL from the background for example:
link from webpage
https://www[dot]facebook[dot]com
need to convert with the following format
https://example[dot]com?url=https://facebook%5Bdot%5Dcom
i know this is old but it was 1 in google serp lol
hey how would i extract links say from tik tok comments
from only the comment div block
Thank you so much for this code. You saved me a ton of time. A website linked 50+ pdf files individually. Your code, along with another fella from NOAA, helped me avoid right clicking each one to download. Instead a simple scrape of the pdf links, and a script through command prompt to download the list of links- voila! 50 pdfs downloaded in a matter of seconds.- using stock Windows nonetheless!
Thank you again!!!!