A quick primer on buffers in Node.js
Understand what exactly are buffers in programming and Node, and how to work with the Buffer class.
Node brings system handling capabilities into JavaScript. Things like working with files (including binary files, of course), network sockets, multi-threading, and so on, are all normal for Node.
Much of this relies on working with binary data efficiently and that's precisely where buffers enter the game. Let's understand what are they and how to work with them.
What are buffers?
The concept of a buffer in programming is a pretty simple one. It is a chunk of memory where data is stored. And that's just it — a small area of memory to store data.
In Node, a buffer represents the same concept. We get to easily store data in memory, read individual bytes, transform those bytes, delete certain bytes, and whatnot.
As stated earlier, because the environment in which Node operates intrinsically revolves around binary data, having a robust API for dealing with binary data (and the skills to work with it) is important.
This API — a fairly low-level one — is Buffer.
Buffer wrapped by new, core JavaScript APIs
Ever since the advent of ES6, JavaScript has gotten native provision of array buffers and typed arrays (to lay out views over those buffers). In other words, core JavaScript already provides us with a plethora of interfaces to work with binary data.
But Node historically had its own implementation for buffers — that is, the Buffer API. Eventually, it got merged with the native implementation of buffers.
So today, Buffer in Node is basically just an extension of the built-in Uint8Array class in JavaScript. Having said that, Buffer exists to date and Node extensively uses it internally in its different modules.
So while you could also directly interface with the core buffer APIs in JavaScript, I would instead recommend you to stick to the Buffer API.
Creating a buffer
There are a handful of ways of creating a buffer in Node. Three most common ones are:
- Providing an integer representing the size of the buffer, in bytes.
- Providing a string to store in the buffer.
- Copying an existing buffer.
And there are even more granular ways but I'll avoid making this discussion complex and rather focus on these most common approaches.
Providing a size in bytes
One of the most straightforward ways to create a new buffer is to provide a size, in units of bytes, to the Buffer.alloc() static method.
Syntactically, this could be expressed as follows (size is an integer representing the size of the buffer, in bytes):
Buffer.alloc(size)
For example, let's say you want to create a buffer spanning 4 bytes of memory. Here's how you could do it:
import { Buffer } from 'node:buffer';
let buffer = Buffer.alloc(4);
console.log(buffer);<Buffer 00 00 00 00>
Take note of the way in which the buffer is presented here...
When we log a buffer in Node, its contents are dumped into the console. Each byte's value is converted into a hexadecimal number and this number is printed.
In the log shown above, notice the four 00s. This means that there are a total of 4 bytes in the buffer, each holding the number 0 in it (which is denoted as 00 in hexadecimal).
255 (three digits) in decimal is equivalent to ff (two digits) in hexadecimal; clearly, ff is shorter.An important thing to note regarding Buffer.alloc() is that it returns a Buffer instance whose individual bytes are prefilled with 0 if we don't specify any other fill value at the time of invocation.
Speaking of which, there's another overloaded form of Buffer.alloc() where we can specify the prefill value:
Buffer.alloc(size, fill)fill can be a number or a string (in which case, it's decoded into a list of numbers; we'll learn more about this later when we understand the notion of encoding in buffers).
For example, consider the following:
import { Buffer } from 'node:buffer';
let buffer = Buffer.alloc(10, 'ab');
console.log(buffer);<Buffer 61 62 61 62 61 62 61 62 61 62>
The repeated pattern 61 62 here represents the bytes for the characters a and b in the given fill value 'ab'. That where these numbers come from, I'll discuss that very soon below.
Generally, we don't need this form of Buffer.alloc() because a prefill of 0 is more than sufficient for most cases.
Providing a string
Another possible way to create a new buffer is to use a string and translate it to a series of bytes. This can be done using Buffer.from().
There are many forms of Buffer.from(). The one that we're interested in right now is when the given argument is a string:
Buffer.from(str)Being able to directly interface with buffers in Node in terms of strings is a feature that's not currently enjoyed by browsers.
TextEncoder and TextDecoder, respectively. They allow us to transition back and forth between a string and a buffer in JavaScript.Remember that a buffer always stores numbers — bytes, so to speak (and each byte is just a number). So when we try to store a string in a buffer, we don't really store the string as it is but rather store its individual byte values. For example, storing the string 'ab' means storing the individual numbers 97 and 98 (61 and 62, respectively, in hexadecimal).
Similarly, when we access the contents of a buffer as a string, each number is converted into a character. For example, the number 100 becomes the character 'd' whereas 65 becomes 'A', and so on. Likewise a buffer with these two bytes, 100 followed by 65, translates to the string 'dA'.
The question is: Where do these numbers come from? Well, they come from the code units corresponding to the characters in UTF-8. For instance, the code unit of the character 'd' is 100, likewise when stored in a buffer, 'd' becomes 100. Simple!
Time for an example. Below we create a buffer from the string 'hello':
import { Buffer } from 'node:buffer';
let buff = Buffer.from('hello');
console.log(buff);<Buffer 68 65 6c 6c 6f>
See how there is no need to specify the byte length for the buffer that's being created; Node itself figures this out based on the length of the string. In this case, the buffer spans 5 bytes because the given string's length is 5 (and also because in the UTF-8 encoding, each of the shown characters takes up 1 byte).
Copying an existing buffer
Another way to create a new buffer is to copy an existing buffer. This might be a practical thing if you wish to transform the contents of a buffer without affecting the original data.
Copying an existing Buffer instance is as simple as calling Buffer.from() on it. This effectively copies the entire memory allocated to the buffer.
Buffer.from(buffer)Shown below is an example:
import { Buffer } from 'node:buffer';
let buff = Buffer.alloc(4, 10);
let buff2 = Buffer.from(buff);
console.log(buff);
console.log(buff2);
console.log(buff.buffer === buff2.buffer);First, a buffer buff is created, spanning 4 bytes and initialized to have the number 10 filled throughout. Next up, this buffer is copied into buff2.
The first two logs simply print the contents of both the buffers, buff and buff2, to confirm whether their contents are the same or not. The third log confirms whether the internal memory slots assigned to both the buffers are different, since we don't ideally want the same buffer to be re-used.
buffer property of a Buffer instance returns back the more low-level, ArrayBuffer instance which directly manages the contents in the memory.Here's the output of the code above:
<Buffer 0a 0a 0a 0a> <Buffer 0a 0a 0a 0a> false
Firstly, as can be seen, the contents of both the buffers are identical as per expectation.
Secondly, the log false clearly indicates that the internal chunks of memory belonging to both buff and buff2 are different.
ArrayBuffer is a core JavaScript API. Recall that the Buffer class in Node is basically a wrapper on top of Uint8Array and so it's merely a view over an ArrayBuffer. ArrayBuffer is the actual low-level representation in JavaScript of a chunk of memory.Writing data
Now that we know of multitude of ways of creating a buffer in Node, let's find out how to write data into a buffer and then read data out of it.
Bracket notation
As stated earlier, Buffer in Node is an extension of the native Uint8Array class in JavaScript. So naturally all the operations that are supported on Uint8Array are supported on Buffer too.
This means that we can leverage the very familiar bracket notation — as we use with arrays — to access individual bytes from a Buffer instance, and also to write to them.
However, be wary of the fact that when assigning a value to a byte, it must be an integer in the range of 0 - 255.
Don't assign characters!
JavaScript, by default, coerces the value assigned to the element of a Buffer instance into a number and then further normalizes the number before assigning the resulting value to the element.
For example, NaN becomes 0 and a value out of range, like 700, becomes 0 too. This means that you won't get any benefit of doing the following:
let buff = Buffer.alloc(4, 1);
console.log(buff);
buff[0] = 'a';
console.log(buff);Here, you might be thinking that assigning 'a' to buff[0] will put the character code of 'a' automatically at the given location but NO, that's not going to happen!
Instead, 'a' first gets converted into a number and then this number gets normalized and ultimately assigned to the given location in the buffer.
In this case, 'a' converts to the number NaN which normalizes to 0. Likewise, the first byte will become 0 following the execution of buff[0] = 'a'.
Let's even confirm this by taking a glimpse into the console logs:
<Buffer 01 01 01 01> <Buffer 00 01 01 01>
See? Before the assignment, each byte holds the decimal number 1 (which is 01 in hex) but after the assignment, the first byte becomes 0 (00 in hex).
So, if you want to assign a character to a given byte, don't forget that buffers do NOT entertain character assignments; you instead need to manually call charCodeAt() on the character before doing so. Something as follows:
let buff = Buffer.alloc(4, 1);
console.log(buff);
buff[0] = 'a'.charCodeAt();
console.log(buff);<Buffer 01 01 01 01> <Buffer 61 01 01 01>
Notice the value of the first byte post-assignment now — it's 61 (in hex) which is the character code of 'a'. Voila!
This granular way of mutating a buffer is really helpful but often times we want to write data all at once. For this, we have the write() instance method of the Buffer class. Let's explore it quickly.
The write() method
write() is one of the more low-level methods exposed by the Buffer class. First, let's see its straightforward syntax:
buff.write(str[, offset[, length[, encoding]]])write()works with strings likewise the string to write to the buffer, that is,str, is the very first argument.offsetrepresents the position where the writing must begin. By default, it's the very first byte, i.e. offset0.lengthspecifies the maximum number of bytes to write. This can never be such that writing exceeds the last byte of the buffer (otherwise, an error is thrown).encodingspecifies the encoding of the string. By default, it's'utf8'and you don't need to typically worry about changing it.
Let's consider an example to help demystify this syntax.
In the code below, we create a fresh buffer, spanning 5 bytes, and then write the string 'hello' to it with the help of write() before finally logging it:
import { Buffer } from 'node:buffer';
let buff = Buffer.alloc(5);
buff.write('hello');
console.log(buff);<Buffer 68 65 6c 6c 6f>
Notice how we skip the last three arguments to buff.write(). That's because there's no need for them.
offsetis omitted because we need'hello'to be written starting at the very first byte inbuff, which is the default.lengthis omitted because the entire length of the buffer needs to be written to, starting at positionoffset.encodingis omitted because... well... who worries about encoding anyway!
Here's another example, this time writing to a portion of a buffer:
import { Buffer } from 'node:buffer';
let buff = Buffer.from('soot');
console.log(buff);
buff.write('ea', 1); // Change 'soot' to 'seat'
console.log(buff);<Buffer 73 6f 6f 74> <Buffer 73 65 61 74>
The goal is to change the buffer's content from the string representation of 'soot' to the one for 'seat'. For this, instead of writing to the whole buffer, we only write the part 'ea', starting at offset 1.
Notice that in this case too, the length parameter isn't provided. This is because the string 'ea' is just two characters and writing it completely is exactly what we need.
The story doesn't end here. There are many more methods to write to a buffer in Node but I'll stop here because write() is probably sufficient for most cases.
Reading data
There are a handful of ways to read data from a buffer in Node just like there are to write data to it. Perhaps, the two most common ones are discussed up next.
Bracket notation
You already saw this above when writing to a Buffer. Naturally, if bracket notation works for writing, then it works for reading too.
As an example, in the following code, we create a buffer holding the string data for 'hello' before accessing its third byte (at index 2):
import { Buffer } from 'node:buffer';
let buff = Buffer.from('hello');
console.log(buff[2]);108
The third byte represents the character code for 'l' which is 108 (6c in hex), hence the shown output.
Reading via toString()
Most classes in JavaScript provide a toString() method for coercion of their underlying values into strings. Following feat, Buffer also provides a toString() method to help convert it into a string — basically read its contents.
Syntactically, toString() isn't a parameterless function unlike most toString()s in JavaScript. Instead, it allows us to control what portion of the buffer we want to read into the string.
buff.toString([encoding[, start[, end]]])encodingspecifies the encoding of the string. By default, it's'utf8'. (No need to worry about changing it.)startspecifies the starting position of the portion to read. Default is0. (Negative indexes don't work!)endspecifies the ending position (not inclusive) of the portion to read. Default isbuff.length. (Negative indexes don't work!)
Without any sort of arguments to toString(), the default behavior, as you can guess, is to read the entire buffer into a string.
One thing I particularly dislike about toString() is its awkward signature. Naturally, encoding shouldn't concern us much when trying to read a portion of a buffer, but with this signature, regardless of what we wish to read, we still need to provide a value for the encoding parameter.
Yes I know we can provide undefined as the encoding (in which case, it's assumed to be 'utf8') but the point is that we have to provide it. I feel that a better signature would've been to have encoding at the very end.
Time for an example.
In the following code, we have the same buffer as before. It's read twice — first the entire data and then only the last two bytes:
import { Buffer } from 'node:buffer';
let buff = Buffer.from('hello');
// Read the entire data
console.log(buff.toString());
// Read the last two bytes
console.log(buff.toString(undefined, 3));hello lo
toString() is a really useful instance method of the Buffer class. Make sure to get well-versed with it.
Buffers, strings, and encoding
If you've worked with Uint8Array before, you'll be aware of the fact that there is NO way to interface with it in terms of strings. Node, on the other hand, with its own Buffer class does allow this.
But behind the scenes, Node also normalizes strings to sequences of numbers. For example, when we do the following:
Buffer.from('hello');we're basically creating a new buffer that has as its contents the bytes corresponding to the characters in 'hello'. Here's how this buffer looks when logged:
<Buffer 68 65 6c 6c 6f>
68 is the character code for 'h' in hexadecimal, 65 is for 'e', 6c is for 'l', and '6f' is for 'o'.
Whenever we go from a character to its corresponding numeric code, we carry out what's called encoding. The reverse, which is to go from the numeric code to the character, is referred to as decoding.
When we give a string to Buffer.from() or to any buffer utility in Node, it first needs to be encoded into a sequence of numbers and then used by the buffer utility. Similarly, if we wish to output the contents of a buffer as a string, the sequence of numbers stored in it ought to be decoded.
Speaking of which, for both encoding and decoding, we further need an encoding format (which is often concisely referred to as encoding scheme or even just as encoding). It simply specifies which number a character corresponds to.
By far, the most efficient and widely used encoding format is UTF-8, whose underlying character set is Unicode. Each character in UTF-8 spans a minimum of 1 byte (8 bits) and a maximum of 4 bytes (32 bits).
Without going too deep into the implementation details of UTF-8, it's sufficient to know that UTF-8 is a very commonly used encoding format across the modern computing world, and in Node too.
By default, all buffer utilities in Node assume the encoding format to be UTF-8, unless stated otherwise (which isn't really required that much unless you're in an advanced setting, e.g. working with base64-encoded strings). The encoding format is specified as a string. UTF-8 is denoted as 'utf8'.
'utf-8' — it's the same thing.Shown below are the rest of the encoding formats that Node supports at the time of this writing apart from UTF-8:
'utf16le': represents UTF-16LE (little endian) whereby each character takes up minimum 2 bytes (16 bits) and maximum of 4 bytes (32 bits); and the most significant byte comes last (a consequence of being little endian).'latin1': represents the character encoding scheme ISO-8859-1 which always spans 1 byte for every character. Out of range numbers are normalized into a byte and then the corresponding character is used.'base64': converts the sequence of numbers in the buffer back and forth between the popular base64 encoding format.'hex': converts the sequence of numbers in the buffer back and forth between the hexadecimal format.
As I stated earlier, you'll mostly not need to worry about specifying an encoding because UTF-8 works wonders for almost all cases. It's only in advanced tasks, such as computing cryptographic hashes, that you may need to resort to other encodings like base64 or hexadecimal.
In general, you're all good! (That's a big relief, isn't it?)