Running on Java 22-ea+27-2262 (Preview)
Home of The JavaSpecialists' Newsletter

047Lack of Streaming Leads to Screaming

Author: Dr. Heinz M. KabutzDate: 2002-05-24Java Version: 1.3Category: Language
 

Abstract: Instead of writing and reading blobs as large byte[], we should rather stream the data using ResultSet.getBytes().

 

Welcome to the 47th edition of The Java(tm) Specialists' Newsletter, read by over 3600 Java programmers in 82 countries.

In the last two weeks I received two questions from readers who ran out of memory when trying to read a big object from the database. In this newsletter I want to explore how you can read a big object from a database without killing your poor JVM, and be scalable as well.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

Lack of Streaming Leads to Screaming

How do you retrieve big objects from the database in Java? Say you have a database containing previews of movies in DivX format, stored as IMAGE columns. How do you retrieve the 25 megabyte file from the database using JDBC?

Simple. We write a SELECT statement, execute it, and say result_set.getBytes(1). We run the code and it works well for small movie snippets, but as soon as we have a 25 megabyte file, our poor JVM throws an OutOfMemoryError. What's annoying about an OutOfMemoryError is that the stack trace is not filled in (because hey, you've run out of memory!), so you cannot exactly determine where the error occurred, unless you add trace logging. What makes it even more tricky is that some JDBC drivers try to be too clever, resulting in OutOfMemoryErrors.

Let's look at some test code. I have written two test classes, TestDatabaseBlobInsert and TestDatabaseBlobFetch. What surprised me was that of the several drivers that I tested (DataDirect, iNet SPRINTA, Avenir, MS SQL Server Type 4, JDBC/ODBC bridge), the JDBC/ODBC bridge was the fastest for inserting big objects into MS SQL Server. For fetching the data it was the slowest.

import java.sql.DriverManager;
import java.sql.Statement;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.io.ByteArrayInputStream;

public class TestDatabaseBlobInsert {
  private static final String TABLE_DROP =
    "DROP TABLE MovieArchive";
  private static final String TABLE_CREATE =
    "Create Table MovieArchive (moviedata image, title varchar(255))";
  private static final String TABLE_INSERT =
    "INSERT INTO MovieArchive (title, moviedata) VALUES (?,?)";

  private static final int size = 25 * 1024 * 1024;
  private final byte[] data = new byte[size];

  private final Connection con;

  public TestDatabaseBlobInsert(String driver, String url,
      String user, String password)
      throws SQLException, ClassNotFoundException {
    Class.forName(driver);
    con = DriverManager.getConnection(url, user, password);
    System.out.println("Driver: " + driver);
    for (int i=0; i<data.length; i++)
      data[i] = (byte)(Math.random()*255);
  }

  public void setUp() throws SQLException {
    Statement st = con.createStatement();
    try {
      System.out.println("Dropping old table");
      st.executeUpdate(TABLE_DROP);
    } catch(SQLException ex) {} // table might not exist
    System.out.println("Creating new table");
    st.executeUpdate(TABLE_CREATE);
    st.close();
  }

  public void testInsertWithBinaryStream() throws SQLException {
    long start = -System.currentTimeMillis();
    System.out.println("Inserting via BinaryStream");
    PreparedStatement stmt = con.prepareStatement(TABLE_INSERT);
    ByteArrayInputStream bis = new ByteArrayInputStream(data);
    stmt.setString(1, "Babe");
    stmt.setBinaryStream(2, bis, data.length);
    stmt.executeUpdate();
    start += System.currentTimeMillis();
    System.out.println("That took " + start + "ms");
    stmt.close();
  }

  public void testInsertWithSetBytes() throws SQLException {
    long start = -System.currentTimeMillis();
    System.out.println("Inserting via setBytes()");
    PreparedStatement stmt = con.prepareStatement(TABLE_INSERT);
    stmt.setString(1, "On Her Majesty's Secret Service");
    stmt.setBytes(2, data);
    stmt.executeUpdate();
    start += System.currentTimeMillis();
    System.out.println("That took " + start + "ms");
    stmt.close();
  }

  public void testAll() throws SQLException {
    setUp();
    testInsertWithBinaryStream();
    testInsertWithSetBytes();
  }

  public static void main(String[] args) throws Exception {
    if (args.length != 4) usage();
    TestDatabaseBlobInsert test = new TestDatabaseBlobInsert(
      args[0], args[1], args[2], args[3]);
    test.testAll();
  }

  private static void usage() {
    System.out.println(
      "Usage: TestDatabaseBlobInsert driver url username password");
    System.exit(1);
  }
}

I ran this code by setting up an ODBC source pointing to the MS SQL Server database Movies, and then running it with a maximum heap space of 256MB:

java -Xmx256m -classpath . TestDatabaseBlobInsert sun.jdbc.odbc.JdbcOdbcDriver jdbc:odbc:Movies sa ""

The result on my little notebook was the following:

Driver: sun.jdbc.odbc.JdbcOdbcDriver
Dropping old table
Creating new table
Inserting via BinaryStream
That took 78975ms
Inserting via setBytes()
That took 73419ms

Back to the issue at hand - how do we get this data out of the database? The seemingly easiest way is to do the following:

1:  PreparedStatement st = con.prepareStatement(
  "SELECT moviedata FROM MovieArchive WHERE title = ?");
2:  st.setString(1, "Babe");
3:  ResultSet rs = st.executeQuery();
4:  if (rs.next()) {
5:    byte[] data = rs.getBytes(1);
}

This code can easily cause an OutOfMemoryError if the available heap memory is less than the size of the data that you are reading. Now for the 1'000'000 dollar question: Where does OutOfMemoryError occur? That depends on your driver. If you are using the iNet SPRINTA or the Avenir drivers, then you will run out of memory on line 4, i.e. when you call rs.next(). If you are using the DataDirect, Microsoft or ODBC bridge drivers, you will only get the out of memory error on line 5.

How can we write this so that we won't get an out of memory error? Here is some sample code. It is very important that you read the data blocks of bytes at a time, rather than in one big chunk, otherwise your system will definitely not scale to support many users.

import java.sql.*;
import java.io.*;

public class TestDatabaseBlobFetch {
  private static final String TABLE_SELECT =
    "SELECT moviedata FROM MovieArchive WHERE title = ?";

  private final Connection con;

  public TestDatabaseBlobFetch(String driver, String url,
      String user, String password)
      throws SQLException, ClassNotFoundException {
    Class.forName(driver);
    con = DriverManager.getConnection(url, user, password);
    System.out.println("Driver: " + driver);
  }

  public void testSelectBlocksAtATime() throws SQLException {
    long start = -System.currentTimeMillis();
    System.out.println("SELECT: 64kb blocks at a time");
    PreparedStatement stmt = con.prepareStatement(TABLE_SELECT);
    stmt.setString(1, "Babe");
    ResultSet rs = stmt.executeQuery();
    int count=0;
    if (rs.next()) {
      try {
        System.out.println("Retrieving Data");
        OutputStream out = new BufferedOutputStream(
          new FileOutputStream("Data.1"));
        InputStream in = new BufferedInputStream(
          rs.getBinaryStream(1));
        byte[] buf = new byte[65536];
        int i;
        while((i = in.read(buf, 0, buf.length)) != -1) {
          out.write(buf, 0, i);
          count += i;
        }
        out.close();
      } catch(IOException ex) { ex.printStackTrace(); }
    }
    System.out.println("fetched " + count + " bytes");
    start += System.currentTimeMillis();
    System.out.println("That took " + start + "ms");
    stmt.close();
  }

  public void testSelectWithGetBytes() throws SQLException {
    long start = -System.currentTimeMillis();
    System.out.println("SELECT: all at once");
    PreparedStatement stmt = con.prepareStatement(TABLE_SELECT);
    stmt.setString(1, "Babe");
    ResultSet rs = stmt.executeQuery();
    byte[] data = null;
    if (rs.next()) {
      System.out.println("Retrieving Data");
      data = rs.getBytes(1);
      try {
        FileOutputStream out = new FileOutputStream("Data.2");
        out.write(data, 0, data.length);
        out.close();
      } catch(IOException ex) { ex.printStackTrace(); }
    }
    System.out.println("fetched " + data.length + " bytes");
    start += System.currentTimeMillis();
    System.out.println("That took " + start + "ms");
    stmt.close();
  }

  public void testAll() throws SQLException {
    testSelectBlocksAtATime();
    testSelectWithGetBytes();
  }

  public static void main(String[] args) throws Exception {
    if (args.length != 4) usage();
    TestDatabaseBlobFetch test = new TestDatabaseBlobFetch (
      args[0], args[1], args[2], args[3]);
    test.testAll();
  }

  private static void usage() {
    System.out.println(
      "usage: TestDatabaseBlobFetch driver url username password");
    System.exit(1);
  }
}

I tried this with several JDBC drivers, the only Type 4 driver that worked correclty was the DataDirect driver (now released under the Microsoft label). I will not go into the differences between the Microsoft driver and the others, that's for another article. If you want to try this out, you can run it like this:

java -Xmx2m -classpath .;msbase.jar;mssqlserver.jar;msutil.jar
  TestDatabaseBlobFetch com.microsoft.jdbc.sqlserver.SQLServerDriver
  jdbc:microsoft:sqlserver://localhost:1433;DatabaseName=Movies sa ""

Naturally you have to download the Microsoft SQL Server Type 4 driver and put the jar files into the directory from which you are running this code. The output from using the DataDirect Microsoft driver is the following on my machine:

Driver: com.microsoft.jdbc.sqlserver.SQLServerDriver
SELECT: 64kb blocks at a time
Retrieving Data
fetched 26214400 bytes
That took 62746ms
SELECT: all at once
Retrieving Data
Exception in thread "main" java.lang.OutOfMemoryError
        <<no stack trace available>>

The iNet SPRINTA driver falls over much sooner - actually when you call rs.next():

Driver: com.inet.tds.TdsDriver
SELECT: 64kb blocks at a time
Exception in thread "main" java.lang.OutOfMemoryError
        <<no stack trace available>>

What Is Wrong Here?

I am finding it very hard to think of a reason to store 25mb files in a database. They are too big to stay in the database's cache for very long. I think that the design is flawed to start with. I would personally rather store the URL to the file in the database, instead of the actual data, and then retrieve the data directly from the file system.

I don't know all the conditions why someone would want to do that, but just remember, you have to stream such big data out of the database chunk by chunk, otherwise you have a serious problem.

Until the next issue ...

Heinz

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack '23

Superpack '23 Our entire Java Specialists Training in one huge bundle more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...