Version 4.4.0 / July 28, 2024
❤️ Sponsor
HtmlUnit@mastodon | HtmlUnit@Twitter
Check out HtmlUnit satellite projects, such as:
- HtmlUnit on android
- HtmlUnit for .Net
- or our Rhino fork (the JS engine)
Note as well that you can use HtmlUnit with Selenium via their htmlunit-driver!
Constantly updating and maintaining the HtmlUnit code base already takes a lot of time.
I would like to make 2 major extensions in the next few months
For doing this I need your sponsoring.
Add to your pom.xml
:
<dependency>
<groupId>org.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
<version>4.4.0</version>
</dependency>
Add to your build.gradle
:
implementation group: 'org.htmlunit', name: 'htmlunit', version: '4.4.0'
HtmlUnit is a "GUI-less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating Chrome, Firefox or Internet Explorer depending on the configuration used.
HtmlUnit is typically used for testing purposes or to retrieve information from web sites.
- Support for the HTTP and HTTPS protocols
- Support for cookies
- Ability to specify whether failing responses from the server should throw exceptions or should be returned as pages of the appropriate type (based on content type)
- Support for submit methods POST and GET (as well as HEAD, DELETE, ...)
- Ability to customize the request headers being sent to the server
- Support for HTML responses
- Wrapper for HTML pages that provides easy access to all information contained inside them
- Support for submitting forms
- Support for clicking links
- Support for walking the DOM model of the HTML document
- Proxy server support
- Support for basic and NTLM authentication
- Excellent JavaScript support
You can start here:
- Getting Started
- The Java Web Scraping Handbook A nice tutorial about webscraping with a lot of background information and details about HtmlUnit.
- Web Scraping Examples how to implement web scraping using HtmlUnit, Selenium or jaunt and compares them.
- The Complete Guide to Web Scraping with Java A small straightforward guide to web scraping with Java.
- How to test Jakarta Faces with HtmlUnit and Arquillian
- WebScraping.AI HtmlUnit FAQ:
Pull Requests and all other Community Contributions are essential for open source software. Every contribution - from bug reports to feature requests, typos to full new features - are greatly appreciated.
The latest builds are available from our Jenkins CI build server
Read on if you want to try the latest bleeding-edge snapshot.
Add the snapshot repository and dependency to your pom.xml
:
<!-- ... -->
<repository>
<id>OSS Sonatype snapshots</id>
<url>https://s01.oss.sonatype.org/content/repositories/snapshots/</url>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
</snapshots>
<releases>
<enabled>false</enabled>
</releases>
</repository>
<!-- ... -->
<dependencies>
<dependency>
<groupId>org.htmlunit</groupId>
<artifactId>htmlunit</artifactId>
<version>4.5.0-SNAPSHOT</version>
</dependency>
<!-- ... -->
</dependencies>
<!-- ... -->
Add the snapshot repository and dependency to your build.gradle
:
repositories {
maven { url "https://s01.oss.sonatype.org/content/repositories/snapshots" }
// ...
}
// ...
dependencies {
implementation group: 'org.htmlunit', name: 'htmlunit', version: '4.4.0-SNAPSHOT'
// ...
}
This project is licensed under the Apache 2.0 License
setup as or refresh the eclipse project
mvn eclipse:eclipse -DdownloadSources=true
run the whole core test suite (no huge tests, no libary tests)
mvn test -U -P without-library-and-huge-tests -Dgpg.skip -Djava.awt.headless=true
check dependencies for known security problems
mvn dependency-check:check
I welcome contributions, especially in the form of pull requests. Please try to keep your pull requests small (don't bundle unrelated changes) and try to include test cases.