Capture a screenshot using Puppeteer
Using Puppeteer directly
To run this example on the Apify Platform, select the apify/actor-node-puppeteer-chrome
image for your Dockerfile.
This example captures a screenshot of a web page using Puppeteer
. It would look almost exactly the same with Playwright
.
- Page Screenshot
- Crawler Utils Screenshot
Using page.screenshot()
:
import { KeyValueStore, launchPuppeteer } from 'crawlee';
const keyValueStore = await KeyValueStore.open();
const url = 'https://crawlee.dev';
// Start a browser
const browser = await launchPuppeteer();
// Open new tab in the browser
const page = await browser.newPage();
// Navigate to the URL
await page.goto(url);
// Capture the screenshot
const screenshot = await page.screenshot();
// Save the screenshot to the default key-value store
await keyValueStore.setValue('my-key', screenshot, { contentType: 'image/png' });
// Close Puppeteer
await browser.close();
Using utils.puppeteer.saveSnapshot()
:
import { launchPuppeteer, utils } from 'crawlee';
const url = 'https://www.example.com/';
// Start a browser
const browser = await launchPuppeteer();
// Open new tab in the browser
const page = await browser.newPage();
// Navigate to the URL
await page.goto(url);
// Capture the screenshot
await utils.puppeteer.saveSnapshot(page, { key: 'my-key', saveHtml: false });
// Close Puppeteer
await browser.close();
Using PuppeteerCrawler
This example captures a screenshot of multiple web pages when using PuppeteerCrawler
:
- Page Screenshot
- Crawler Utils Screenshot
Using page.screenshot()
:
import { PuppeteerCrawler, KeyValueStore } from 'crawlee';
// Create a PuppeteerCrawler
const crawler = new PuppeteerCrawler({
async requestHandler({ request, page }) {
// Capture the screenshot with Puppeteer
const screenshot = await page.screenshot();
// Convert the URL into a valid key
const key = request.url.replace(/[:/]/g, '_');
// Save the screenshot to the default key-value store
await KeyValueStore.setValue(key, screenshot, { contentType: 'image/png' });
},
});
await crawler.addRequests([
{ url: 'https://www.example.com/page-1' },
{ url: 'https://www.example.com/page-2' },
{ url: 'https://www.example.com/page-3' },
]);
// Run the crawler
await crawler.run();
Using the context-aware saveSnapshot()
utility:
import { PuppeteerCrawler } from 'crawlee';
// Create a PuppeteerCrawler
const crawler = new PuppeteerCrawler({
async requestHandler({ request, saveSnapshot }) {
// Convert the URL into a valid key
const key = request.url.replace(/[:/]/g, '_');
// Capture the screenshot
await saveSnapshot({ key, saveHtml: false });
},
});
await crawler.addRequests([
{ url: 'https://www.example.com/page-1' },
{ url: 'https://www.example.com/page-2' },
{ url: 'https://www.example.com/page-3' },
]);
// Run the crawler
await crawler.run();
To take full page screenshot using puppeteer we need to pass parameter fullPage
as true
in the screenshot()
: page.screenshot(fullPage: true)
In both examples using page.screenshot()
, a key
variable is created based on the URL of the web page. This variable is used as the key when saving
each screenshot into a key-value store.