Technology Articles

Developing Intelligent Apps with Spice.AI

AI is one of the most in-demand technologies today. However,  until now there hasn’t been a convenient environment for creating AI-driven applications. That is, until  Spice.AI was introduced to the AI development community. In this overview, we are going to follow the Spice.AI step-by-step guide and demonstrate  what  it can currently offer.

Note: we need to consider that the Spice.AI project is under active alpha stage development (v0.6-alpha version will be reviewed) and is not intended to be used in production until its 1.0-stable release.

What is Spice.AI?

Spice.AI is an open-source, portable runtime for training and using deep learning on time series data, which allows users to create Reinforcement Learning models with minimal effort. It is written in Golang and Python, and runs as a container or microservice with applications calling for a simple HTTP API. It is deployable to any public cloud, on-premises, and edge.

Spice.AI was basically designed to help software developers use  a reinforcement-learning approach to create intelligent applications without needing a deep knowledge of AI/ML frameworks and tools (at least that’s what our team says!).

First, we’ll take a closer look at Reinforcement Learning. Reinforcement Learning is a machine learning training method based on rewarding desired, required actions and punishing undesired actions. In general, a reinforcement learning agent is able to make a sequence of decisions to maximize the reward. 

One of the most important parts of the training is Spice.AI pods. A Spice.AI pod is a set of configurations and parameters used for the training. To train a custom model, you will need to create your own pod.. To understand how to use and create pods, you can use the specifications on the official website. This is the trickiest part of creating a custom model because there is no convenient way to automatically produce it (Hope it will be considered in future releases).

Getting started

To get started with Spice.AI, you need to have Docker installed. You also must set up Space CLI by running the cURL command:

curl https://install.spiceai.org | /bin/bash

And that’s it! You are ready to go!

After the installation is complete, you will be ready to explore existing testing environments. Here you can find several testing environments:

You then must clone the repo and select one of the presented quickstarts.

Quick Starts

Let’s look at one of the existing samples called “Gardener”. This sample uses temperature and moisture sensor data for watering a garden. The “Gardener” is a typical example of a Reinforcement Learning problem. We have an agent (gardener), an environment (garden), a state (moisture and temperature values), a set of possible actions (open or close the valve), and rewards for those actions.

First, we must clone the repository with Space.AI samples:

Then, go to the “gardener” repository and download the pod for training:

cd samples/gardener

spice add samples/gardener
spice add function downloads the data and pod from the server for the current sample into the “spicepods” directory. So, if you want to use custom data, you will need to create the pod manually. We can find the following parameters in the pod:
dataspaces:   
  - from: sensors     
    name: garden     
    fields:
      - name: temperature
      - name: moisture     
    data:       
      connector:
        name: file
        params:
          path: data/garden_data.csv
      processor:
        name: csv

This particular Spice.AI dataspace uses a csv  data processor and a file  data connector to extract the temperature  and moisture  columns from data/garden_data.csv. You can learn more about dataspaces in the Concepts section of the Spice.AI documentation.

The possible actions to provide as recommendations are defined in the actions section of the manifest:

actions:
   - name: close_valve
   - name: open_valve_half
   - name: open_valve_full

Spice.AI will train using these three actions to provide a recommendation on which one to take when asked.

Spice.AI learns which action to recommend by rewarding or penalizing an action using a Reward definition. Review the rewards section of the manifest:

This section tells Spice.AI to reward each action, given the state at that step. These rewards are defined by simple Python expressions that assign a value to each reward. A higher value means that Spice.AI will learn to make this action more frequently as it trains. You can use values from your Dataspaces to calculate these rewards. They can be accessed with the expression (next_state|current_state)["from_name_field"].

Here, (next_state["sensors_garden_moisture"] and (current_state["sensors_garden_moisture"]are being used to either reward or penalize either opening or closing the watering valve. Notice how (next_state["sensors_garden_moisture"] compares to (current_state["sensors_garden_moisture"] in the reward for close_valve. This allows Spice.AI to gain a sense of directionality from its data.

We are ready to start the training process, but first, let’s see what we have in the other files in the directory.

 garden.py describes the behavior of the garden environment. It is a toy example of how moisture and temperature change during the day.

 main.py gets recommendations from Spice.AI and interacts with the environment: either opening or closing the water flow.

Training

First of all, we need to start Spice.AI CLI using the  spice run command:

We can start the training process using the spice train  gardener command in the new terminal window. We will use default properties for the training. Now, we can observe the training process on http://localhost:8000/pods/gardener

The graph above shows us that the model uses open_valve_full  most of the time, and that the reward function is not stable. Now, let’s see how it will work in the simulated environment. Then, we need to run python main.py  to check it.

The model constantly chooses to open the valve and waste water even if we have a penalization in the rules. So, as we can see, the model did not train well and constantly chose only one action. Let’s increase the number of training episodes. 

This time, the reward dramatically increased after 17 episodes and  close_valve was taken in the majority of tries. Let’s check how it will work in our environment this time.

Now, the model constantly closes the valve until the garden is completely dry. So, increasing the number of episodes didn’t help the model learn the rules correctly.

Ok, let’s change the learning algorithm from Deep Q-Learning to a Vanilla Policy Gradient by specifying the appropriate parameter and increasing the episodes more  spice train gardener --learning-algorithm vpg

Here, we can see smoother rewards growth, but still, the model selects only the  close_valve action.

Let’s look closer at the rewards function. The reward for the closing valve is significantly higher than other rewards. Additionally, we now see that the model selects close_valve all the time because of the simplicity of the policy that the model sees: select close_valve all the time, which will lead to a higher reward sum. Now we can try to change the reward function to see if it helps avoid this type of overfitting. We can weight the close_valve reward in the same way as with other rewards:

But even with these changes, we see the same situation: the model selects only one action for all the observations, and now it selects  open_valve_full constantly. Any other changes to the reward functions lead to selecting one dominant action and ignoring the others.

Conclusion

The first advantage of Spice.AI is the simplicity of creating Reinforcement Learning models just with one command  spice train. It also gives us an opportunity to train the model without setting up the code. So far, there are three Reinforcement Learning models available for the training: Deep Q-Learning, Vanilla Policy Gradient, and Soft Actor-Critic, which we can select by specifying the corresponding parameter. But on the other hand, setting up pods for the training from scratch could be time-consuming and unintuitive. Even using the existing reward function, we didn’t achieve the needed action policy in the Gardener sample environment, which shows that it needs more precise fine-tuning to get satisfying results.

Additionally, Spice.AI has a very convenient interface. We can get model predictions using API requests, which greatly simplifies development, Spice.AI also has a simple dashboard that gives us a clear picture of the model training.

Finally, we need to consider that experiments were made using the alpha version of the project, which is in the active stage of development. Even so, the results are very promising and we will continue to keep track of project progress.

This article written by Danylo Kosmin, Akvelon Machine Learning Engineer