Using BeautifulSoup to Search Yahoo Finance Statistics Page

Using BeautifulSoup to Search Yahoo Finance Statistics Page

I am trying to scrape data out of the Yahoo Finance Statistics Page.
In this instance, it is the "5 Year Average Dividend Yield".
The data that I need is in this type of format.

<tr> <td> <span>5 Year Average Dividend Yield</span> </td> <td class="Fz(s) Fw(500) Ta(end)">6.16</td> </tr>

I'm new to beautifulsoup and I'm trying to read the bs4 doco, but have had no luck yet so far.
I just realised that I was parsing through a table. (Yes, I'm a noob).

Here's my code so far. It successfully prints out all the rows in the table.
I need help with isolating the row that contains "5 Year Average Dividend Yield".
I just need the numerical value in the next column.
Thanks in advance.

New edit: I've placed version 0.8 below which gets the "5 Year Average Dividend Yield" value that I was looking for.

# Version 0.8 - This worked. It got the value for "5 Year Average Dividend Yield" # Aim: Find value for"5 Year Average Dividend Yield". import csv, os, time import sys from bs4 import BeautifulSoup import urllib import xlsxwriter from selenium import webdriver from importlib import reload file_path = "C:/temp/temp29/" file_name = "ASX_20180621_lite.txt" file_path_name = file_path + file_name print(file_path_name) # Phase 1 - place all ticker symbols into an array tickers_phase1_arr = with open(file_path_name, "rt") as incsv: readcsv = csv.reader(incsv, delimiter=',') rownum = 0 colnum = 0 for row in readcsv: ticker_phase1 = row[rownum] ticker_dot_ax = ticker_phase1 + ".AX" tickers_phase1_arr.append(ticker_dot_ax) #print(ticker) rownum + 1 print(tickers_phase1_arr) # Phase 2 key_stats_on_stat = ['5 Year Average Dividend Yield'] #Initialise the browser browser = webdriver.PhantomJS() tickers_phase2_arr = data = {} for ticker_phase2 in tickers_phase1_arr: print(ticker_phase2) #time.sleep(5) #Set the main and stats url url = "https://finance.yahoo.com/quote/{0}/key-statistics?p={0}".format(ticker_phase2) #START - This block of code scrapes for the Previous Code value in the Main Page browser.get(url) # Run a script that gets all the html in the webpage that the browser got from the get request innerHTML = browser.execute_script("return document.body.innerHTML") #Turn innerHTML into a BeautifulSoup object to make the components easier to access for scraping soup = BeautifulSoup(innerHTML, 'html.parser') # Find the Previous Close value for stat in key_stats_on_stat: page_stat = soup.find(text=stat) try: page_row = page_stat.find_parent('tr') try: page_statnum = page_row.find_all('span')[1].contents[0] except: page_statnum = page_row.find_all('td')[1].contents[0] except: print('Invalid parent for this element') page_statnum = "N/A" print(page_statnum)

1 Answer
1

There are a few ways you can reach the td element containing the desired value from the previous td element. One of them would be to first get the span element in the first column and then use find_next() to find the next td element:

td

span

find_next()

td

tr.find(text='5 Year Average Dividend Yield').find_next('td').get_text()

where tr represents the current row.

tr

Another approach might scale a bit better. If you would need to do this kind of requests often, you may construct a dictionary having texts of the elements in the first column as keys and second column elements as values:

data = {} for tr in soup.find('table').find_all('tr'): first_cell, second_cell = tr.find_all('td')[:2] data[first_cell.get_text(strip=True)] = second_cell.get_text(strip=True)

Then, you can query data by the text of the first column:

data

print(data['5 Year Average Dividend Yield'])

Hi Alec. Thanks for your help with this one. I tried running the second option, however it wasn't finding any data. However, your comment about getting the span, then the td. Got me to thinking. I've posted an updated version of the code and it picks up the value that I'm looking for. Cheers, Joe
– JiggidyJoe
Jul 1 at 4:11

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

4VOuykNmdFzpQMTJp9Vh1Vch rU kv25dW R5zjfAKDPBir

搜尋此網誌

Rtyjn