Embeddings and all the Black Magic

First steps with Embeddings and OpenAI: from API setup to a working Jupiter Notebook.

Embeddings and all the Black Magic
Photo by Arnór Ingi Júlíusson / Unsplash

The first AI concept that truly fascinates me is embeddings.

There is so much literature about it that I won't try to explain it again. For that, open up Medium and search for an introduction to this subject. I find a good starting point in this article.

Here comes the black magic:

Embeddings catch the meaning of a text-based source. And that meaning is surprisingly language-unaware.

You can embed an English document
and search it with an Italian query.

In my first experiment, I indexed a markdown-based documentation project that I wrote in English. Then, I managed to find useful information out of a query that I wrote in my local Italian dialect.

I was stunned.

Embeddings For Dummies

Embeddings are not something you do; you would instead buy them in a marketplace such as OpenAI, which is possibly the most famous. There are also many open-source alternatives, but hosting and running them requires a heck of a piece of hardware.

I am an eager consumer of ChatGPT and so I've decided to start with OpenAI.

Setup your API Key

Initially, I got confused because I couldn't find any "API Tokens" section in the ChatGPT interface, and I was puzzled about how to get to use it programmatically 🧐.

It took some googling to figure out that ChatGPT and OpenAI are two different things. I hope you appreciate me sharing this with you, even if it casts silly spotlights on my weekend-induced naivety.

So here is the receipt:

  • Go to OpenAI and Login
  • Navigate to Settings / Billing
  • Add some funds: you need a minimum deposit of 5€ to be able to use the API!
  • Move to "API Keys" and generate your key

👉 I tried to generate the key before making the money transfer, but I got "insufficient funds" even though I could see some trial credits. This is not something you quite expect.

Once you have your key, create a .env file in your JupiterLAB project (checkout the repo here) and store the key:

# .env
OPENAI_API_KEY=xxx

And restart your project:

make restart

Generate Embeddings

As we said, generating embeddings is just a matter of turning text into an array of numbers.

Checkout the step-by-step source code here:

learning-python/notebooks/openai-embeddings.ipynb at main · marcopeg/learning-python
A containerized project to play around with Python and Jupiter Notebook - marcopeg/learning-python