Skip to content Skip to sidebar Skip to footer

Puppeteer Not Behaving Like In Developer Console

I am trying to extract using Puppeteer the title of this page: https://www.nordstrom.com/s/zella-high-waist-studio-pocket-7-8-leggings/5460106 I have the below code, (asy

Solution 1:

If you only need the innerText of title you could do it with page.$eval puppeteer method to achieve the same result:

const title = await page.$eval('title', el => el.innerText)
console.log(title)

Output:

Zella High Waist Studio Pocket 7/8 Leggings | Nordstrom

page.$$eval(selector, pageFunction[, ...args])

The page.$eval method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.


However: your main problem is that the page you are visiting is a Single-Page App (SPA) made in React.Js, and its title is filled dynamically by the JavaScript bundle. So your puppeteer finds a valid title element in the <head> when its content is simply: "" (an empty string).

Normally you should use waitUntil: 'networkidle0' in case of SPAs to make sure the DOM is populated by the actual JS framework properly and it is fully functional:

await page.goto('https://www.nordstrom.com/s/zella-high-waist-studio-pocket-7-8-leggings/5460106', {
    waitUntil: 'networkidle0'
  })

Unfortunately with this specific website it throws a timeout error as the network connections don't close until the 30000 ms default timeout, something seems to be not OK on the webpage's frontend side (webworker handling?).

As a workaround you can force puppeteer sleep for 8 seconds with: await page.waitFor(8000) before you try to retrieve the title: by that time it will be properly populated. Actually when you run your script in DevTools Console it works because you are not immediately running the script: that time the page is already fully loaded, DOM is populated.

This script will return the expected title:

asyncfunctionfn() {
  const browser = await puppeteer.launch({ headless: false })
  const page = await browser.newPage()

  await page.goto('https://www.nordstrom.com/s/zella-high-waist-studio-pocket-7-8-leggings/5460106', {
    waitUntil: 'networkidle2'
  })
  await page.waitFor(8000)

  const title = await page.$eval('title', el => el.innerText)
  console.log(title)

  await browser.close()
}
fn()

Maybe const browser = await puppeteer.launch({ headless: false }) affects the result as well.

Solution 2:

when navigating to the page wait until the page is loaded

await page.goto(req.params[0], { waitUntil: "networkidle2" }); //this is the url

Could you try this

try {
    title = await page.evaluate(() => {
        const title = document.title;
        const isTitleThere = title == null? false: true//recently read that this checks for undefined as well as null but not an //undeclared varreturn {"title":title,"isTitleThere" :isTitleThere }
    })

} catch (error) {
    console.log(error, 'There was an error');

}

or this

try {
title = await page.evaluate(() => {
    const title = document.querySelector('meta[property="og:title"]');
    const isTitleThere = title == null? false: true//recently read that this checks for undefined as well as null but not an //undeclared varreturn {"title":title,"isTitleThere" :isTitleThere }
   })

   } catch (error) {
   console.log(error, 'There was an error');

   }

Post a Comment for "Puppeteer Not Behaving Like In Developer Console"