Rust and Crates

Introduction

Rust is a brilliant programming language that has taken the development world by storm. It combines performance, memory safety and concurrency, making it the perfect language for a variety of use cases. 

Rust falls into a category we call "multi-paradigm" programming languages. It has some functional programming concepts, but it is entirely designed around memory safety and performance paradigms, more specifically leaning towards the world of systems and concurrency.

There are a couple of key conceptual pieces that I have learned about Rust that I'd like to share. Let's start.

The Rust Compiler (rustc)

The Rust compiler today (rustc) is known as a self hosting compiler. What this essentially means is that the compiler is written with the same source code that it compiles. Interestingly enough, the first Rust compiler was actually built in OCaml, which is one of the most powerful functional programming languages in the world. 

LLVM, which stands for "Low-Level Virtual Machine", is the core backend of the Rust compiler. There are two steps involved: 

  1. rustc will convert the code to something called an "intermediate representation". We primarily care about this because there is such a diverse range of target architectures (x86, ARM, etc)... making it a much more complicated implementation to generate machine code for each specific processor that exists and comes out in the future. 
  2. This intermediate representation is then passed onto LLVM, which converts it into machine code, which is basically sequences of binary digits.

Interestingly enough, LLVM is written in C++, and is an extremely mature piece of software that will remain foundational to a lot of programming languages for years to come. It actually came out of a research project by Chris Lattner at UIUC. If you aren't familiar with Chris Lattner, take a look at his recent company Modular AI. They are essentially building a simplistic python language with the performance of CUDA for GPU programming. Cool right?

Cargo

Simply put, cargo is a package manager and build tool for Rust. It helps us build, run, test, and manage dependencies super easily. Cargo essentially manages crates, and can fetch them and ensure they are properly compiled and linked. 

Crates 

Rust lives and breathes off open source. There is a massive crate registry here called crates.io. You're probably wondering, what are crates?

They are simply an organizational unit. They can represent a single executable in your project (such as the main file), or they can represent an entire library/project allowing us to pull reusable functions, types, etc. One crate can use another crate - which is a foundational concept to modularity in Rust.

Compile Time Enforcement

The way Rust makes guarantees to the developer and the system is it uses the compiler to enforce rules. This enforcement is driven by ownership, borrowing and lifetime principles. The part of the compiler that is enforcing these rules is the Borrow Checker. Let's expand on the three below.

Ownership

Every single piece of data in rust has a single owner. If the owner goes out of scope, the data is deallocated. For example: 

This string will live and die within the boundaries of the brackets. This essentially means that when the program reaches the closing brace, the scope has ended, Rust will call a drop function and deallocate the memory that was being managed on the heap for the string "My Name is Omeed".

Now you can also transfer ownership, which is essentially doing variable reassignment, but in the process, you'll kill the original variable. For example, if I have a variable x = 5, and I set y = x, by default, x will die. This is a memory safety feature that will avoid accessing freed memory or freeing the same memory multiple times. 

Borrowing

Let's say we don't want to take ownership at all. We can borrow a variable. You can borrow using references. In Rust, the referencing symbol used is &. It is important to not get this confused with address-of operator in C. It is a bit harder to get it mixed up with C++ references, since the syntax explicitly uses ref and is essentially an alias to an existing variable vs. borrowed pointers with immutable and mutable options. Ok, I know that was a lot... but let's break down borrowing a bit more now.

In Rust borrowing, you can essentially have one mutable reference or unlimited immutable references. Here's the catch, they can not co-exist. Let's say I have a string x. I can do 20 immutable borrows if I want using the operator. But... if I try to do a mutable borrow as well on x using &mut, the compiler will enforce its rule and compilation will fail.

This might not seem shocking, but this is the CORE reason that data races do not happen in Rust. In a concurrent system, if an object is immutable and mutable simultaneously, one thread could read a variable while another thread in your system is modifying it, which is not good.... essentially the first thread would see garbage because the second thread overwrote it in the middle of a read.

You're probably thinking... doesn't this make the language feel a bit "limiting" compared to the multi-pointer reference paradigm in C++? True. You aren't wrong. But this guarantees correctness and safety which can be quite powerful in a "mission-critical" software engineering setting. Life is all about trade-offs, this is one of them.

Additionally, there are things like concurrency helper functions (Mutex for example), ownership transfer, lifetimes (which I'll talk about in the next section), etc. 

Lifetimes

In a lot of programming languages, like Java or Python, there is automatic garbage collection... which naturally comes with overhead. In C++, there is no automated garbage collection, but reference management is essentially up to the user - you are the garbage collector in a sense. This granular control can in turn lead to better performance and more predictability in your code. In Rust, these concepts do not exist. Rather, we rely on lifetimes paired with all the other concepts discussed above. 

The way to think about lifetimes is that the compiler is essentially trying to avoid a reference outliving its parent. If I have a variable x, I don't want reference y and z to outlive x. If x is deallocated, I want y and z to be invalid as well. The reason we say invalid is because references don't actually own heap memory, their lifetime is simply being enforced by the borrow checking feature. 

Usually, when it comes to lifetimes, the borrow checker feature of the compiler is actually doing an inference based on references throughout your code. But, in scenarios where there is a lot of complexity, you can explicitly annotate variables using the 'a lifetime parameter to tell the compiler relationships between lifetimes so it can more accurately infer.

Final Thoughts 

Overall, I hope this post has been educational in learning some of the baseline concepts of Rust and what makes it so powerful.