What is a Garbage Collector?

·

2 min read

When you write a piece of code, you inevitably need to use memory. To keep track of the data you care about. This data can be list of customers, contents of a file, input from the user or anything. Unfortunately, the amount of available memory is not unlimited. So when you program is running, you can't keep everything forever there. So how should we solve this problem? There are two ways:

  1. As soon as you no longer need a piece of memory, tell the Operating System about that so it can use that for other purposes too (maybe another program is running at the same time and that piece of memory can be useful there). This is what C/C++ language do. It can be really difficult/tricky to do as you (the developer) need to keep track of all memory operations and make sure to let OS know when it is no longer needed.
  2. You can delegate this to the runtime environment that executes your application. This approach is very common and used by most of the programming languages like Python or Java.

The second approach, needs a Garbage Collector to do the job. This GC is a piece of code which is already provided by the language and handles all the memory management for you, so you can focus on the code you write and not managing computer memory.

The problem is, GC needs to do a lot of processing. It needs to keep track of all memory allocations, and the references to those allocations in your code. Note that there can be N references to a single block of allocated memory. GC needs to take care of all N references and it can only return the memory to OS when all those references are no longer valid in the code. As a result, GC will need to stop your application (for a few milliseconds) and do the job (check memory allocations, check references, mark those allocations which can be returned, return those marked allocations). These few milliseconds can be really expensive in some cases (for example for low latency applications or real-time processing).

Java has introduced a new type of GC which tries to use concurrent techniques to make keep this overhead minimum. It is a new experimental GC algorithm and you can check here for more information.