AI is one of the most in-demand technologies today. However, until now there hasn’t been a convenient environment for creating AI-driven applications. That is, until Spice.AI was introduced to the AI development community. In this overview, we are going to follow the Spice.AI step-by-step guide and demonstrate what it can currently offer.
Note: we need to consider that the Spice.AI project is under active alpha stage development (v0.6-alpha version will be reviewed) and is not intended to be used in production until its 1.0-stable release.
What is Spice.AI?
Spice.AI is an open-source, portable runtime for training and using deep learning on time series data, which allows users to create Reinforcement Learning models with minimal effort. It is written in Golang and Python, and runs as a container or microservice with applications calling for a simple HTTP API. It is deployable to any public cloud, on-premises, and edge.
Spice.AI was basically designed to help software developers use a reinforcement-learning approach to create intelligent applications without needing a deep knowledge of AI/ML frameworks and tools (at least that’s what our team says!).
First, we’ll take a closer look at Reinforcement Learning. Reinforcement Learning is a machine learning training method based on rewarding desired, required actions and punishing undesired actions. In general, a reinforcement learning agent is able to make a sequence of decisions to maximize the reward.
One of the most important parts of the training is Spice.AI pods. A Spice.AI pod is a set of configurations and parameters used for the training. To train a custom model, you will need to create your own pod.. To understand how to use and create pods, you can use the specifications on the official website. This is the trickiest part of creating a custom model because there is no convenient way to automatically produce it (Hope it will be considered in future releases).
To get started with Spice.AI, you need to have Docker installed. You also must set up Space CLI by running the cURL command:
curl https://install.spiceai.org | /bin/bash
And that’s it! You are ready to go!
After the installation is complete, you will be ready to explore existing testing environments. Here you can find several testing environments:
- GitHub – spiceai/quickstarts: Quickstart projects to get up and running with Spice.AI quickly 🚀
- GitHub – spiceai/samples: Learn about Spice.AI with in-depth samples
You then must clone the repo and select one of the presented quickstarts.
Let’s look at one of the existing samples called “Gardener”. This sample uses temperature and moisture sensor data for watering a garden. The “Gardener” is a typical example of a Reinforcement Learning problem. We have an agent (gardener), an environment (garden), a state (moisture and temperature values), a set of possible actions (open or close the valve), and rewards for those actions.
First, we must clone the repository with Space.AI samples:
git clone https://github.com/spiceai/samples.git
Then, go to the “gardener” repository and download the pod for training:
cd samples/gardener spice add samples/gardener
spice addfunction downloads the data and pod from the server for the current sample into the “spicepods” directory. So, if you want to use custom data, you will need to create the pod manually. We can find the following parameters in the pod:
dataspaces: - from: sensors name: garden fields: - name: temperature - name: moisture data: connector: name: file params: path: data/garden_data.csv processor: name: csv
This particular Spice.AI dataspace uses a
csv data processor and a
file data connector to extract the
moisture columns from
data/garden_data.csv. You can learn more about dataspaces in the Concepts section of the Spice.AI documentation.
The possible actions to provide as recommendations are defined in the
actions section of the manifest:
actions: - name: close_valve - name: open_valve_half - name: open_valve_full
Spice.AI will train using these three actions to provide a recommendation on which one to take when asked.
Spice.AI learns which action to recommend by rewarding or penalizing an action using a Reward definition. Review the
rewards section of the manifest:
This section tells Spice.AI to reward each action, given the state at that step. These rewards are defined by simple Python expressions that assign a value to each
reward. A higher value means that Spice.AI will learn to make this action more frequently as it trains. You can use values from your Dataspaces to calculate these rewards. They can be accessed with the expression
(current_state["sensors_garden_moisture"]are being used to either reward or penalize either opening or closing the watering valve. Notice how
(next_state["sensors_garden_moisture"] compares to
(current_state["sensors_garden_moisture"] in the reward for
close_valve. This allows Spice.AI to gain a sense of directionality from its data.
We are ready to start the training process, but first, let’s see what we have in the other files in the directory.
garden.py describes the behavior of the garden environment. It is a toy example of how moisture and temperature change during the day.
main.py gets recommendations from Spice.AI and interacts with the environment: either opening or closing the water flow.
First of all, we need to start Spice.AI CLI using the
spice run command:
We can start the training process using the
spice train gardener command in the new terminal window. We will use default properties for the training. Now, we can observe the training process on http://localhost:8000/pods/gardener
The graph above shows us that the model uses
open_valve_full most of the time, and that the reward function is not stable. Now, let’s see how it will work in the simulated environment. Then, we need to run
python main.py to check it.
The model constantly chooses to open the valve and waste water even if we have a penalization in the rules. So, as we can see, the model did not train well and constantly chose only one action. Let’s increase the number of training episodes.
This time, the reward dramatically increased after 17 episodes and
close_valve was taken in the majority of tries. Let’s check how it will work in our environment this time.
Now, the model constantly closes the valve until the garden is completely dry. So, increasing the number of episodes didn’t help the model learn the rules correctly.
Ok, let’s change the learning algorithm from Deep Q-Learning to a Vanilla Policy Gradient by specifying the appropriate parameter and increasing the episodes more
spice train gardener --learning-algorithm vpg
Here, we can see smoother rewards growth, but still, the model selects only the
Let’s look closer at the
rewards function. The reward for the closing valve is significantly higher than other rewards. Additionally, we now see that the model selects
close_valve all the time because of the simplicity of the policy that the model sees: select
all the time, which will lead to a higher reward sum. Now we can try to change the reward function to see if it helps avoid this type of overfitting. We can weight the
reward in the same way as with other rewards:
But even with these changes, we see the same situation: the model selects only one action for all the observations, and now it selects
open_valve_full constantly. Any other changes to the reward functions lead to selecting one dominant action and ignoring the others.
The first advantage of Spice.AI is the simplicity of creating Reinforcement Learning models just with one command
spice train. It also gives us an opportunity to train the model without setting up the code. So far, there are three Reinforcement Learning models available for the training: Deep Q-Learning, Vanilla Policy Gradient, and Soft Actor-Critic, which we can select by specifying the corresponding parameter. But on the other hand, setting up pods for the training from scratch could be time-consuming and unintuitive. Even using the existing reward function, we didn’t achieve the needed action policy in the Gardener sample environment, which shows that it needs more precise fine-tuning to get satisfying results.
Additionally, Spice.AI has a very convenient interface. We can get model predictions using API requests, which greatly simplifies development, Spice.AI also has a simple dashboard that gives us a clear picture of the model training.
Finally, we need to consider that experiments were made using the alpha version of the project, which is in the active stage of development. Even so, the results are very promising and we will continue to keep track of project progress.
This article written by Danylo Kosmin, Akvelon Machine Learning Engineer