Java String getBytes() Method

Through this article, the Java getBytes() method and its various syntaxes, this tutorial provides a comprehensive understanding of how to convert a String into a sequence of bytes in different character encodings. The article introduces the concept of character encoding and explains the significance of the default platform encoding.

We go through the three distinct forms of the getBytes() method, involving no parameters, a Charset parameter, and a charset name parameter. We also cover It covers common character encodings such as UTF-8, UTF-16, UTF-16BE, UTF-16LE, US-ASCII, and ISO-8859-1, providing insights into their differences and use cases.

What is getBytes() in Java?

The getBytes() method in Java is used to convert a String into a sequence of bytes, utilizing the default character encoding of the platform. The resulting bytes are then stored in a newly created byte array. In Java, an array is an object that holds elements of the same data type, and a byte represents the smallest addressable unit of memory.

In Java, the size of an int data type is 4 bytes, while a char data type occupies 1 byte. The getBytes() method takes charset and string as parameters, where charset represents the character encoding and can be either a charset object or a charset name. If the provided charset or charset name is invalid, the method throws an exception.

Different types getBytes() method in Java

In Java, the getBytes() method has 3 different types of syntaxes; let us first take a look at all 3 and then look at an example utilizing each type to understand its implementation. The 3 syntaxes of the getBytes() method are shown below:

string.getBytes()

string.getBytes(Charset charset)

string.getBytes(String charsetName)

The first syntax converts the String into a sequence of bytes using the platform’s default character encoding and then stores the result in a new byte array. The second syntax encodes the String into a byte array using the specified character encoding, specified by “charset”.

The third syntax Converts the String into bytes using the specified character encoding identified by its name, specifier by “charsetName”. Let us take a look at an example for each syntax to understand better.

Using getBytes() without any parameters

import java.util.Arrays;
class Main 
{
  public static void main(String[] args) 
  {
    String str = "FirstCode";
    byte[] byteArray;
    byteArray = str.getBytes();
    System.out.println(Arrays.toString(byteArray));
  }
}

Output:

[70, 105, 114, 115, 116, 67, 111, 100, 101]

The code shown above converts a string, “FirstCode,” into a sequence of bytes. In the main method, a String variable str is initialized with the value “FirstCode.” Next, a byte array byteArray is declared. Using the getBytes() method without specifying a character encoding, the String is converted into bytes based on the default character encoding of the platform. The resulting byte values are stored in the byteArray.

The Arrays.toString(byteArray) method is then utilized to convert the byte array into a human-readable string representation, and this output is printed to the console. The displayed output, [70, 105, 114, 115, 116, 67, 111, 100, 101], represents the ASCII values of the characters in the “FirstCode” string. Each number corresponds to the ASCII code of the respective character in the string. For example, ‘F’ has an ASCII value of 70, ‘i’ has 105, ‘r’ has 114, and so on.

Using the getBytes() by setting the “charset” parameter

As we have seen above, the charset parameters encode the String into a byte array using the specified character encoding. The getBytes() method can take the following parameters: UTF-8 is a character encoding format utilizing eight bits for the UCS Transformation. UTF-16, on the other hand, uses sixteen bits for the UCS Transformation.

UTF-16BE represents the sixteen-bit UCS Transformation Format with big-endian byte order, while UTF-16LE signifies the same format with little-endian byte order. US-ASCII is a character encoding standard employing seven bits for ASCII representation, while ISO-8859-1 stands for ISO Latin Alphabet No. 1, using eight bits for character encoding. Let us take a look at a code that utilizes all these parameters.

import java.util.Arrays;
import java.nio.charset.Charset;
class Main 
{
  public static void main(String[] args) 
  {
    String str = "FirstCode";
    byte[] byteArray;
    byteArray = str.getBytes(Charset.forName("UTF-8"));
    System.out.println(Arrays.toString(byteArray));
    byteArray = str.getBytes(Charset.forName("UTF-16"));
    System.out.println(Arrays.toString(byteArray));
    byteArray = str.getBytes(Charset.forName("UTF-16BE"));
    System.out.println(Arrays.toString(byteArray));
    byteArray = str.getBytes(Charset.forName("UTF-16LE"));
    System.out.println(Arrays.toString(byteArray));
    byteArray = str.getBytes(Charset.forName("US-ASCII"));
    System.out.println(Arrays.toString(byteArray));
    byteArray = str.getBytes(Charset.forName("ISO-8859-1"));
    System.out.println(Arrays.toString(byteArray));
  }
}

Output:

[70, 105, 114, 115, 116, 67, 111, 100, 101]
[-2, -1, 0, 70, 0, 105, 0, 114, 0, 115, 0, 116, 0, 67, 0, 111, 0, 100, 0, 101]
[0, 70, 0, 105, 0, 114, 0, 115, 0, 116, 0, 67, 0, 111, 0, 100, 0, 101]
[70, 0, 105, 0, 114, 0, 115, 0, 116, 0, 67, 0, 111, 0, 100, 0, 101, 0]
[70, 105, 114, 115, 116, 67, 111, 100, 101]
[70, 105, 114, 115, 116, 67, 111, 100, 101]

Using the getBytes() by setting the “string” parameter

import java.util.Arrays;
class Main 
{
  public static void main(String[] args) 
  {
    String str = "Java";
    byte[] byteArray;
    try 
    {
      byteArray = str.getBytes("UTF-8");
      System.out.println(Arrays.toString(byteArray));
      byteArray = str.getBytes("UTF-16");
      System.out.println(Arrays.toString(byteArray));
    } catch (Exception e) 
    {
      System.out.println(e + " encoding is wrong");
    }
  }
}

Output:

[74, 97, 118, 97]
[-2, -1, 0, 74, 0, 97, 0, 118, 0, 97]

The code shown above uses “getBytes(String charset)” method to convert a String, “Java,” into byte arrays using different character encodings. In the main method, a String variable str is initialized with the value “Java.” Inside a try-catch block, the code attempts to convert the string into bytes using both the UTF-8 and UTF-16 character encodings.

For the UTF-8 encoding, the resulting byte array [74, 97, 118, 97] is printed to the console using Arrays.toString(byteArray). Each number in the array represents the UTF-8 encoded byte value for the corresponding character in the “Java” string.

For the UTF-16 encoding, the byte array [-2, -1, 0, 74, 0, 97, 0, 118, 0, 97] is displayed. In UTF-16, a BOM (Byte Order Mark) is represented by the first two bytes, -2 and -1. The subsequent bytes then represent the UTF-16 encoded values for the characters in the “Java” string.

In case of any exception during the encoding process, such as an unsupported character set, the catch block catches the exception, and an error message is printed to the console, indicating that the encoding is incorrect.

Conclusion

To sum up, this article teaches Java programmers how to use the getBytes() method for converting Strings to bytes. It emphasizes the importance of understanding character encodings, which is like knowing the language in which data is written. By giving practical examples with different encoding options, the article helps programmers get hands-on experience.

In a nutshell, this knowledge is not just about writing code but understanding how data behaves. It helps programmers make smart choices, ensuring their code works well in different situations. As Java programming evolves, this article equips developers with practical skills and a mindset for handling data effectively.

Leave a Reply

Your email address will not be published. Required fields are marked *