VBA to get the href value

Multi tool use
VBA to get the href value
I am writing macro to extract the href value from a website, example here is to extract the value: '/listedco/listconews/SEHK/2015/0429/LTN201504291355_C.pdf' from the html code below. The href is one of the attributes of the html tag 'a', I have add the code getElementbyTagName'a' but it did not work, my question is how to extract that href value to column L. Anyone could help? Thanks in advance!
<a id="ctl00_gvMain_ctl03_hlTitle" class="news" href="/listedco/listconews/SEHK/2015/0429/LTN201504291355_C.pdf" target="_blank">二零一四年年報</a>
Sub Download_From_HKEX()
Dim internetdata As Object
Dim div_result As Object
Dim header_links As Object
Dim link As Object
Dim URL As String
Dim IE As Object
Dim i As Object
Dim ieDoc As Object
Dim selectItems As Variant
Dim h As Variant
Dim LocalFileName As String
Dim B As Boolean
Dim ErrorText As String
Dim x As Variant
'Key Ratios
For x = 1 To 1579
Set IE = New InternetExplorerMedium
IE.Visible = True
URL = "http://www.hkexnews.hk/listedco/listconews/advancedsearch/search_active_main_c.aspx"
IE.navigate URL
Do
DoEvents
Loop Until IE.readyState = 4
Application.Wait (Now + TimeValue("0:00:05"))
Call IE.Document.getElementById("ctl00_txt_stock_code").setAttribute("value", Worksheets("Stocks").Cells(x, 1).Value)
Set selectItems = IE.Document.getElementsByName("ctl00$sel_tier_1")
For Each i In selectItems
i.Value = "4"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_tier_2")
For Each i In selectItems
i.Value = "159"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_d")
For Each i In selectItems
i.Value = "01"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_m")
For Each i In selectItems
i.Value = "04"
i.FireEvent ("onchange")
Next i
Set selectItems = IE.Document.getElementsByName("ctl00$sel_DateOfReleaseFrom_y")
For Each i In selectItems
i.Value = "1999"
i.FireEvent ("onchange")
Next i
Application.Wait (Now + TimeValue("0:00:02"))
Set ieDoc = IE.Document
With ieDoc.forms(0)
Call IE.Document.parentWindow.execScript("document.forms[0].submit()", "JavaScript")
.submit
End With
Application.Wait (Now + TimeValue("0:00:03"))
'Start here to extract the href value.
Set internetdata = IE.Document
Set div_result = internetdata.getElementById("ctl00_gvMain_ctl03_hlTitle")
Set header_links = div_result.getElementsByTagName("a")
For Each h In header_links
Set link = h.ChildNodes.Item(0)
Worksheets("Stocks").Cells(Range("L" & Rows.Count).End(xlUp).Row + 1, 12) = link.href
Next
Next x
End Sub
What is the problem you have encountered? It's not clear from your question. Edit your question to elaborate.
– omegastripes
Sep 20 '15 at 20:18
div_result.getElementsByClassName("a")
>> div_result.getElementsByTagName("a")
– Tim Williams
Sep 21 '15 at 0:26
div_result.getElementsByClassName("a")
div_result.getElementsByTagName("a")
@TimWilliams Hi Tim, thanks for your answer but sorry it is just a typo. I did tried getElementsByTagName, as well as getElementsByClassName("news"), they did not work, a solution may be get the attribute "href" after getting the TagName"a", since the "href" is one of the attributes of tag "a". But I don't know the code to get attributes, could you help?
– Nicholas Kan
Sep 21 '15 at 1:02
Your anchor element with target href has
id="ctl00_gvMain_ctl03_hlTitle
, so you can retrirve url IE.document.getElementById("ctl00_gvMain_ctl03_hlTitle").href
or simply IE.document.ctl00_gvMain_ctl03_hlTitle.href
. Also try to retrieve the data you need via XHR instead of IE.– omegastripes
Sep 21 '15 at 2:55
id="ctl00_gvMain_ctl03_hlTitle
IE.document.getElementById("ctl00_gvMain_ctl03_hlTitle").href
IE.document.ctl00_gvMain_ctl03_hlTitle.href
3 Answers
3
For Each h In header_links
Worksheets("Stocks").Cells(Range("L" & Rows.Count).End(xlUp).Row + 1, 12) = h.href
Next
EDIT: The id
attribute is supposed to be unique in the document: there should only be a single element with any given id. So
id
IE.Document.getElementById("ctl00_gvMain_ctl03_hlTitle").href
should work.
@Tim Williams, Hi Tim, your code works for extracting all the href links on the website but what I want to extract is only those with
id="ctl00_gvMain_ctl03_hlTitle"
, the href attribute is one of those attributes in the tag <a>, I think something like getAttribute would work but I don't know the coding for that, could you help again? Thanks for your patience!– Nicholas Kan
Sep 21 '15 at 5:40
id="ctl00_gvMain_ctl03_hlTitle"
@Tim Hi Tim, I am currently using your firs answer which is
h.href
method, it works perfectly. But I did not test your updated answer. However, thanks a lot!– Nicholas Kan
Sep 22 '15 at 15:57
h.href
WB.Document.GetElementById("ctl00_gvMain_ctl04_hlTitle").GetAttribute("href").ToString
Use a CSS selector to get the element then access its href
attribute.
href
#ctl00_gvMain_ctl03_hlTitle
The above is element with id ctl00_gvMain_ctl03_hlTitle
. "#"
means id.
id ctl00_gvMain_ctl03_hlTitle
"#"
Debug.Print IE.document.querySelector("#ctl00_gvMain_ctl03_hlTitle").getAttribute("href").innerText
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
Is there anyone could help?
– Nicholas Kan
Sep 20 '15 at 12:47