Saving Web Page Content as a File in Java

In Java 11 and later, you can use the HttpClient from the standard library to fetch web page content and save it as a file. This is a powerful addition to Java, making it easier than ever to perform HTTP operations. In this blog post, we’ll explore how to achieve this using HttpClient with few practical examples. We will demonstrate with a JUnit test how to fetch web page content and save it as a file using Java’s HttpClient. Let’s get started!

Prerequisites

If you don’t already have Maven installed, you can download it from the official Maven website https://maven.apache.org/download.cgi or through SDKMAN https://sdkman.io/sdks#maven

You can clone the https://github.com/dmakariev/examples repository.

git clone https://github.com/dmakariev/examples.git
cd examples/java-core/httpclient

Creating a Maven Project

Let’s create a our project

  1. Open your terminal and navigate to the directory where you want to create your project.
  2. Run the following command to generate a new Maven project:
    mvn archetype:generate -DgroupId=com.makariev.examples.core -DartifactId=httpclient -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false 
    

    This command generates a basic Maven project structure with a sample Java class, and the group ID and artifact ID are set as per your requirements.

Deleting Initial Files and Updating Dependencies

To clean up the initial files generated by the Maven archetype and update dependencies, follow these steps:

  1. Delete the src/main/java/com/makariev/examples/core/App.java file.
  2. Delete the src/test/java/com/makariev/examples/core/AppTest.java file.
  3. Open the pom.xml file and delete the JUnit 3 dependency (junit:junit).
  4. Add the JUnit 5 and AssertJ dependencies to the pom.xml file:
<dependencies>
    <!-- JUnit 5 -->
    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter-api</artifactId>
        <version>5.10.0</version> <!-- Use the latest version -->
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.junit.jupiter</groupId>
        <artifactId>junit-jupiter-engine</artifactId>
        <version>5.10.0</version> <!-- Use the latest version -->
        <scope>test</scope>
    </dependency>
    <!-- AssertJ -->
    <dependency>
        <groupId>org.assertj</groupId>
        <artifactId>assertj-core</artifactId>
        <version>3.24.2</version> <!-- Use the latest version -->
        <scope>test</scope>
    </dependency>
</dependencies>

Fetching Web Page Content with HttpClient

Here is the link to the javadoc for HttpClient https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/HttpClient.html

An HttpClient can be used to send requests and retrieve their responses. An HttpClient is created through a builder. The builder can be used to configure per-client state, like: the preferred protocol version ( HTTP/1.1 or HTTP/2 ), whether to follow redirects, a proxy, an authenticator, etc. Once built, an HttpClient is immutable, and can be used to send multiple requests.

An HttpClient provides configuration information, and resource sharing, for all requests sent through it.

A BodyHandler must be supplied for each HttpRequest sent. The BodyHandler determines how to handle the response body, if any. Once an HttpResponse is received, the headers, response code, and body (typically) are available. Whether the response body bytes have been read or not depends on the type, T, of the response body.

Requests can be sent either synchronously or asynchronously

1. Fetching and Saving Web Page Content Synchronously

import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;

public class WebContentDownloader {

    public static void downloadWebPageContentSynchronously(String url, String savePath) throws IOException, InterruptedException {
        HttpClient httpClient = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(url))
                .build();

        HttpResponse<byte[]> response = httpClient.send(request, HttpResponse.BodyHandlers.ofByteArray());

        if (response.statusCode() == 200) {
            byte[] responseBody = response.body();
            Path file = Path.of(savePath);
            Files.write(file, responseBody);
        }
    }
}

2. Fetching and Saving Web Page Content Asynchronously

import java.io.IOException;
import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.concurrent.CompletableFuture;

public class WebContentDownloader {

    public static CompletableFuture<Void> downloadWebPageContentAsynchronously(String url, String savePath) {
        HttpClient httpClient = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create(url))
                .build();

        return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofByteArray())
                .thenApply(response -> {
                    if (response.statusCode() == 200) {
                        byte[] responseBody = response.body();
                        Path file = Path.of(savePath);
                        try {
                            Files.write(file, responseBody);
                        } catch (IOException e) {
                            e.printStackTrace();
                        }
                    }
                    return null;
                });
    }
}

JUnit 5 Test - FetchContentHttpClientExampleTest

Now, let’s create a single JUnit 5 test called FetchContentHttpClientExampleTest.java in the src/test/java/com/makariev/examples/core directory to demonstrate both the synchronous and asynchronous examples.

package com.makariev.examples.core;

import org.junit.jupiter.api.Test;
import static org.assertj.core.api.Assertions.assertThat;

import java.io.IOException;
import java.nio.file.Path;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutionException;

public class FetchContentHttpClientExampleTest {

    @Test
    void testFetchWebPageContentSynchronously() throws IOException, InterruptedException {
        String url = "https://example.com";
        String savePath = "example.html";

        WebContentDownloader.downloadWebPageContentSynchronously(url, savePath);
        Path file = Path.of(savePath);

        assertThat(file.toFile().exists()).isTrue();
        assertThat(file.toFile().length()).isGreaterThan(0);
    }

    @Test
    void testFetchWebPageContentAsynchronously() throws ExecutionException, InterruptedException {
        String url = "https://example.com";
        String savePath = "example.html";

        CompletableFuture<Void> future = WebContentDownloader.downloadWebPageContentAsynchronously(url, savePath);
        future.get(); // Wait for the asynchronous operation to complete

        Path file = Path.of(savePath);

        assertThat(file.toFile().exists()).isTrue();
        assertThat(file.toFile().length()).isGreaterThan(0);
    }
}

Running the Test

To run the test, execute the following command in the project’s root directory:

mvn test

JUnit 5 and AssertJ will execute the test, and you should see output indicating whether the test passed or failed.

Conclusion

In this blog post, we’ve explored how to use Java’s HttpClient to fetch web page content and save it as a file. We provided both synchronous and asynchronous examples to cater to different use cases. We created a JUnit test called FetchContentHttpClientExampleTest to showcase these examples.


Coffee Time!

Happy coding!

Share: Twitter LinkedIn