Extracting "hidden" Html With Jsoup
Solution 1:
The data seems to loaded with AJAX. JSoup does not process Javascript.
What you need is a "headless browser" API, that processes Javascript without actually rendering anything.
HtmlUnit seems to be the best known tool, although I've never used it myself. As suggested before, Selenium Webdriver is also an option.
I believe you will have to load the URL, wait for all the AJAX to process, and you will eventually get almost the same parse tree you get in Chrome in Java to do with it as you wish!
Solution 2:
If this is the only information you will be needing, here's the JSON
url to the information you seek:
This has been retrieved by inspecting the Network tab of the Chrome developer tool, and you can get the contents of this url by using HttpConnection
. An example can be found here. After getting the JSON
file you can then parse it to retrieve whatever information you need.
Post a Comment for "Extracting "hidden" Html With Jsoup"