Beautiful Soup: Build a Web Scraper With Python Part 4

Gauloran

Global Moderatör
7 Tem 2013
8,188
636
Extract Attributes From HTML Elements
At this point, your Python script already scrapes the site and filters its HTML for relevant job postings. Well done! However, one thing that’s still missing is the link to apply for a job.

While you were inspecting the page, you found that the link is part of the element that has the title HTML class. The current code strips away the entire link when accessing the .text attribute of its parent element. As you’ve seen before, .text only contains the visible text content of an HTML element. Tags and attributes are not part of that. To get the actual URL, you want to extract one of those attributes instead of discarding it.

Look at the list of filtered results python_jobs that you created above. The URL is contained in the href attribute of the nested <a> tag. Start by fetching the <a> element. Then, extract the value of its href attribute using square-bracket notation:

python_jobs = results.find_all('h2',
string=lambda text: "python" in text.lower())

for p_job in python_jobs:
link = p_job.find('a')['href']
print(p_job.text.strip())
print(f"Apply here: {link}\n")
The filtered results will only show links to job opportunities that include python in their title. You can use the same square-bracket notation to extract other HTML attributes as well. A common use case is to fetch the URL of a link, as you did above.

Building the Job Search Tool
If you’ve written the code alongside this tutorial, then you can already run your script as-is. To wrap up your journey into web scraping, you could give the code a final makeover and create a command line interface app that looks for Software Developer jobs in any ******** you define.

You can check out a command line app version of the code you built in this tutorial at the link below:

Get Sample Code: Click here to get the sample code you’ll use for the project and examples in this tutorial.

If you’re interested in learning how to adapt your script as a command line interface, then check out How to Build Command Line Interfaces in Python With argparse.

Additional Practice
Below is a list of other job boards. These linked pages also return their search results as static HTML responses. To keep practicing your new skills, you can revisit the web scraping process using any or all of the following sites:

PythonJobs
Remote(dot)co
Indeed
Go through this tutorial again from the top using one of these other sites. You’ll see that the structure of each website is different and that you’ll need to re-build the code in a slightly different way to fetch the data you want. This is a great way to practice the concepts that you just learned. While it might make you sweat every so often, your coding skills will be stronger for it!

During your second attempt, you can also explore additional features of Beautiful Soup. Use the ********ation as your guidebook and inspiration. Additional practice will help you become more proficient at web scraping using Python, requests, and Beautiful Soup.

Conclusion
Beautiful Soup is packed with useful functionality to parse HTML data. It’s a trusted and helpful companion for your web scraping adventures. Its ********ation is comprehensive and relatively user-friendly to get started with. You’ll find that Beautiful Soup will cater to most of your parsing needs, from navigating to advanced searching through the results.

In this tutorial, you’ve learned how to scrape data from the Web using Python, requests, and Beautiful Soup. You built a script that fetches job postings from the Internet and went through the full web scraping process from start to finish.

You learned how to:

Inspect the HTML structure of your target site with your browser’s developer tools
Gain insight into how to decipher the data encoded in URLs
Download the page’s HTML content using Python’s requests library
Parse the downloaded HTML with Beautiful Soup to extract relevant information
With this general pipeline in mind and powerful libraries in your toolkit, you can go out and see what other websites you can scrape! Have fun, and remember to always be respectful and use your programming skills responsibly.

This post is quoted​
 
Üst

Turkhackteam.org internet sitesi 5651 sayılı kanun’un 2. maddesinin 1. fıkrasının m) bendi ile aynı kanunun 5. maddesi kapsamında "Yer Sağlayıcı" konumundadır. İçerikler ön onay olmaksızın tamamen kullanıcılar tarafından oluşturulmaktadır. Turkhackteam.org; Yer sağlayıcı olarak, kullanıcılar tarafından oluşturulan içeriği ya da hukuka aykırı paylaşımı kontrol etmekle ya da araştırmakla yükümlü değildir. Türkhackteam saldırı timleri Türk sitelerine hiçbir zararlı faaliyette bulunmaz. Türkhackteam üyelerinin yaptığı bireysel hack faaliyetlerinden Türkhackteam sorumlu değildir. Sitelerinize Türkhackteam ismi kullanılarak hack faaliyetinde bulunulursa, site-sunucu erişim loglarından bu faaliyeti gerçekleştiren ip adresini tespit edip diğer kanıtlarla birlikte savcılığa suç duyurusunda bulununuz.