Let’s build a file encrypter with Python

Today we’re going to take a bit of a detour from web development and head more towards the realm of software development, while dunking our toes into the world of intrigue that is cryptography and information security. I used a lot of buzzwords there but do not be alarmed! I intend this tutorial to be accessible to all programming levels, however having prior knowledge of Python or another object oriented programming language, and knowing your way around the command line, will make it a lot easier to follow along. Even if this sounds all over your head, I encourage you to continue reading. There is no better way to learn than to tackle a hard problem, deconstruct it, and identify the what you know and what you don’t know. If my computer science degree taught me anything, it’s that Google is my best friend. With that being said, let’s get this started!

Encryption

Let’s start by defining a key concept here: encryption. Encryption is the process of altering information or data into a form that is unrecognizable from its original state. It needs to be a reversible process (called decryption) and should make it hard for unauthorized parties to obtain the information you are encrypting. Encryption algorithms take an input, referred to as plaintext, which is the data you want to encrypt, and outputs ciphertext, which should look like a garbled up mess if everything was successful.

All algorithms use a key which is just a long string of bits (1’s and 0’s) to handle encryption and decryption. This key works similarly to a physical key in real life. Just like your house key locks and unlocks your door, an encryption key locks and unlocks data. There are many kinds of encryption algorithms out there that use different sized keys. Typically the larger the key size is, the harder the encryption is for someone to break or hack. Check out this video about encryption that does a way better job of explaining how it works, without going too in depth.

The encryption algorithm we’ll be using today is called AES. It stands for Advanced Encryption Standard and was adopted by the U. S. government in 2001 as the standard to encrypt electronic data. We’ll be encrypting with a 256-bit key derived with PBKDF2, which stands for Password-Based Key Derivation Function. There is a lot of jargon in that sentence, but simply put we will be using a commonly used encryption technique with a key we create with a password that we set.

What you’ll need

In this tutorial we will be coding in Python 3 and using the Pycryptodome library. Pycryptodome is a Python package that implements a lot of different cryptographic functions. This is awesome because we get to play around with a bunch of different encryption algorithms and tools without having to code them from scratch.

You are going to need to have Python installed on your computer and any text editor you are comfortable using. If you are on Windows 10 and need help getting set up, follow this Guide that will take you through installing the Windows Subsystem for Linux (WSL) and get you up and running with python. Just complete up through the “Run a simple Python program” section. The Windows Subsystem for Linux (WSL) is a relatively new Windows 10 feature that allows you to run Linux programs natively in a Windows version of the Bash shell. When you install it you get an awesome Ubuntu-based command line terminal where you can do all your development. I recommend doing this tutorial with WSL since it’s what I developed this program in and because I believe it’s a really good way to start easing yourself into the Linux ecosystem, which is important to learn if you want to get into web or software development.

Mac Users

macOS should come with python 2.7 out of the box. We can install python 3 using Homebrew, Mac’s package manager. If you don’t have Homebrew installed on your machine, open up your terminal and run:

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

That should get Homebrew installed. Now let’s use it to get Python 3. In the terminal run:

brew install python3

This will tell Homebrew to grab the latest version of python and you should be good to go on with the rest of the tutorial.

Once you have your development environment ready to go, type the following into your command prompt and hit enter:

pip3 install pycryptodome

This command invokes pip, the python package manager, and tells it to go search for and install pycryptodome and any dependencies so we can start coding some awesome encryption scripts. Once pip finishes installation we will be ready to start.

Now, with your text editor of choice, let’s create a new file. I named mine crypto.py.

The encryption functions

Below, I’m going to start defining some functions for our encryption, then I’ll explain what they do line-by-line.

from Crypto.Cipher import AES
from Crypto import Random
from Crypto.Protocol.KDF import PBKDF2
import os

def encrypt(text, key):
    pad = lambda s: s + b"\0" * (AES.block_size - len(s) % AES.block_size)
    text = pad(text)
    initialization = Random.new().read(16)
    cipher = AES.new(key, AES.MODE_CBC, initialization)
    return initialization + cipher.encrypt(text)

In the first four lines we are importing all the external packages we will be using throughout the program. From the Pycryptodome package we are importing the AES, PBKDF2, and Random functions. We will also be importing the os package which should already come with your Python installation.

Next we define our main encryption function. The function takes two inputs: the text or data we want to encrypt and a key used for encryption. We handle key derivation a little later. In this function we need to handle a few important things. First we need to pad the text to ensure the size of the data is a multiple of the block size of the algorithm. The block size of AES is 16 bytes so our data must be padded so that it is a multiple of 16 bytes. You can pad your data with any character. In this example we are padding with null character, “\0”.

Second we create our initialization vector (IV). An IV is a string of random bits that is used to mix in with your data during encryption to make it harder to crack. Here we create a new variable called initialization and pass 16 random bytes into it.

Now we can create our AES encryption object. Here I name the object cipher, call the Pycryptodome AES function and pass it our yet-to-be-generated key, tell it we want CBC mode and finally pass it our IV. Don’t worry about understanding CBC mode for now. We are saying we want to use Cipher Block Chaining. CBC is a bit of an advanced topic to try to explain in brevity here, but I encourage you to research on your own if interested.

Our encryption object is now constructed! Next we finish this function off by returning the IV + the ciphertext which we get by running cipher.encrypt on our padded text.

Encrypting files

OK, so we just wrote a function that encrypts strings, but now we want to encrypt files. How can we do this? We need to write a function that reads a file, saves its contents as a string and then run our encryption function on it. Then we write that encrypted string to a new file and, voila, file encryption! Lets define that function below:

def file_encrypt(file_name, key):
    with open(file_name, 'rb') as input_file:
        plaintext = input_file.read()
    encrypted_text = encrypt(plaintext, key)
    with open(file_name + ".enc", 'wb') as output_file:
        output_file.write(encrypted_text)
    os.remove(file_name)

This function also takes two parameters: a string that will be the name of the file we wish to encrypt and, again, the key that we will generate shortly. First we open the file, read its contents, and run our previously defined encryption function on it. Then we create a new file and write the encrypted ciphertext to it. We save the file with the same file name as the original but with an added .enc extension to indicate to us that it is encrypted. Now all that is left to do is to delete the original unencrypted file from our directory using the os.remove() function.

The decryption functions

We now have a mechanism to “lock” files, but what good is a lock that can not be unlocked? As I explained earlier, encryption needs to be a reversible process. Let’s build a pair of functions that will allow us to decrypt our files.

def decrypt(encrypted_text, key):
    initialization = encrypted_text[:AES.block_size]
    cipher = AES.new(key, AES.MODE_CBC, initialization)
    plaintext = cipher.decrypt(encrypted_text[AES.block_size:])
    return plaintext.rstrip(b"\0")

Here we take two parameters: our encrypted text and the key we used for encryption. It is very important we pass in the same the key we previously used to encrypt or python will throw us an error when we run the code.

The first thing we need to do is identify our IV from our encrypted text. This is easy since we saved it at the front of the string. Since we know its size, we can perform a string slice only copying the first 16 bytes of the string (the AES block size).

Now that we have our IV we can create our AES object again and use it to decrypt the message omitting the first 16 bytes. The result is the plaintext padded with the null characters(“\0”). Finally we can return the original text by stripping off the null characters.

Decrypting files

It turns out that decrypting files is just as straightforward as encrypting them. We perform the same basic steps. First we read the encrypted text, pass it to our decryption function and write the plaintext output to a new file.

def file_decrypt(file_name, key):
    with open(file_name, 'rb') as input_file:
        encrypted_text = input_file.read()
    decrypted_text = decrypt(encrypted_text, key)
    with open(file_name[:-4], 'wb') as output_file:
        output_file.write(decrypted_text)
    os.remove(file_name)

Generating keys

Finally the moment you’ve all been waiting for… key derivation! Thus far we’ve built a functioning lock but we have no key to lock it with. There are many key derivation functions we can use that offer various trade-offs between speed and security. If an algorithm generates a key too quickly, it could be easily cracked. By the same token, if you choose an algorithm that produces a very secure key, then it may be too resource-intensive to be used practically. For our purposes the PBKDF2 algorithm offers a good speed-vs-security balance. This algorithm will give us a 32 byte (256 bit) key given a user-supplied password with salt. Yes, I said salt and, no, this didn’t turn into a recipe blog post. In cryptography, salt is a random string of bytes that is used by the algorithm that makes it more difficult for your password to be cracked. It also allows multiple users to enter the same password and generate unique keys. Ideally you should generate a unique salt for every user to increase password security. For our purposes, however, we are using a “fixed” or hard-coded salt. This means we will be using the same salt for every key generated. Again, this is not safe and should not be done for serious implementations. Do not use this program to encrypt files you actually need to keep 100% safe. This tutorial is for educational purposes only.

Now let’s write our key generator!

def generate_key(password):
    salt = b'\x83\xdb\xb9\xd3\xdc"\x1e\x0ee"\x0c\xf0=5\xab_\x18\xd7\xd2\x98\x92Q.\xbd\x9cK\x96\x93-J\x08\xe0'
    return PBKDF2(password, salt, dkLen=32)

This probably didn’t need its own user defined function but I wrote it this way to make what we’re doing a little more clear. Pycryptodome does all the heavy lifting for us here. All we have to do is supply the password in the form of a string and the salt. You can generate your own salt by opening your python console and typing the following commands:

from Crypto.Random import get_random_bytes
get_random_bytes(32)

That command will generate a random byte string based on the length you specify, in this case, 32 bytes. Now all we do is feed this to Pycryptodome’s PBKDF2 function and return the output which should be a 32 byte (256 bit) key!

Finishing touches

If you have been following along this far go ahead and pat yourself on the back. We just made the inner workings of a pretty sweet encrypting tool! Now all we need to do is make a simple driver to test it all out.

Here is the completed code:

from Crypto.Cipher import AES
from Crypto import Random
from Crypto.Protocol.KDF import PBKDF2
import os

# *** Encryption Functions ***
def encrypt(text, key, key_size = 256):
    pad = lambda s: s + b"\0" * (AES.block_size - len(s) % AES.block_size)
    text = pad(text)
    initialization = Random.new().read(16)
    cipher = AES.new(key, AES.MODE_CBC, initialization)
    return initialization + cipher.encrypt(text)

def file_encrypt(file_name, key):
    with open(file_name, 'rb') as input_file:
        plaintext = input_file.read()
    encrypted_text = encrypt(plaintext, key)
    with open(file_name + ".enc", 'wb') as output_file:
        output_file.write(encrypted_text)
    os.remove(file_name)

# *** Decryption Functions ***
def decrypt(encrypted_text, key):
    initialization = encrypted_text[:AES.block_size]
    cipher = AES.new(key, AES.MODE_CBC, initialization)
    plaintext = cipher.decrypt(encrypted_text[AES.block_size:])
    return plaintext.rstrip(b"\0")

def file_decrypt(file_name, key):
    with open(file_name, 'rb') as input_file:
        encrypted_text = input_file.read()
    decrypted_text = decrypt(encrypted_text, key)
    with open(file_name[:-4], 'wb') as output_file:
        output_file.write(decrypted_text)
    os.remove(file_name)

# This function takes an alphanumeric string as a password and passes it to pycryptodome's PBKDF2 
# algorithm to generate an encryption key 
def generate_key(password):
    salt = b'\x83\xdb\xb9\xd3\xdc"\x1e\x0ee"\x0c\xf0=5\xab_\x18\xd7\xd2\x98\x92Q.\xbd\x9cK\x96\x93-J\x08\xe0'
    return PBKDF2(password, salt, dkLen=32)

secretfile = "top_secret.txt"

password = str(input("Enter a password for encryption:"))
key = generate_key(password)

option = int(input("Enter 1 to encrypt file.\nEnter 2 to decrypt file.\nSelection:"))
if option == 1:
    file_encrypt(secretfile, key)
    print("file encrypted!")

elif option == 2:
    encrypted = secretfile + ".enc"
    file_decrypt(encrypted, key)
    print("file decrypted!")

At the very end I included a simple test to see it all in action. The string secretfile contains the name of the file you want to encrypt.

To run the code from the command line make sure you are in the directory the file is located in and run the following command: python crypto.py.

If you have older versions of python installed, you may have to run python3 crypto.py to specify we want to use Python 3.

Feel free to modify the code in any way you want! Remember this code isn’t 100% safe, do not use it to secure sensitive documents or anything you do not want to possibly become corrupted!

Great job sticking with the tutorial and seeing it through to the end! I hope you learned a little bit about how encryption works and that I was able to pique your interest in cryptography and information security. It’s a really promising career path that is in high demand. I encourage you to do a little research and come up with ways to make our code more secure. Maybe look into password hashing and ways to validate that the user entered the correct password. Well that’s enough out of me. Thanks for reading and catch you next time!

About the author

Nic Metras is a recent Computer Science grad from San Diego, CA. When he isn't in front of a keyboard you might be able to find him enjoying a refreshing beverage from any number of craft breweries!

Website