In-Depth: Deploy with Replicate Cog
Last updated
Was this helpful?
Last updated
Was this helpful?
In this tutorial, we will deploy an endpoint built with framework by , for packaging and running machine learning models.
We will deploy the image generation model.
black-forest-labs/FLUX.1-schnell
is a 12 billion parameter rectified flow transformer by , capable of generating images from text descriptions.
For this example you need a Python environment running on your local machine, a (or Docker-compatible) container runtime installed on your computer. A container registry to store the image created by and DataCrunch cloud account to create a deployment.
Docker is a platform for developing, shipping, and running applications. You can learn how to set up Docker from the . Note that you can use any Docker-compatible container runtime, such as:
We are using Python version 3.12 for this tutorial. You can set up your Python environment as you see fit, however we are using combined with bash shell for this example.
You will need to have Cog installed on your computer. Please follow the and choose your preferred method of setting up Cog.
For the sake of our example, we will use nonexistent GitHub registry url ghcr.io/username/container-image
In the examples remember to replace this with your own GitHub registry url.
Please make sure that you have credentials to login to your registry. You can login to GitHub container registry by typing the following command:
Next we will create a container image. Please create a folder named flux-schnell
and save the following files in it, starting with cog.yaml
, defining the dependencies and the predictor class required to run the model:
Next, please create predict.py
, containing the Predictor
class needed for setting up and running the model:
Next, run the following command to build the container image:
This step will use the configuration defined in the cog.yaml
to create the container image and store it in local container registry. The step can take quite some time to complete, as it downloads all the dependencies, such as required libraries and the model weights, and builds the container image.
When the previous step has completed, you should see the container image in your local container registry. To verify, please run:
You should see something similar to this, where you have the prefix cog-
followed by folder name flux-schnell
(this may be different, if you used a different folder name).
Next, tag the image and push it to your remote container registry. We do not support pulling containers with the :latest
tag in order to make sure that all deployments are consistent. Please make sure you use distinct tags for your container updates.
This will push the container image to your remote registry. Uploading the image to the container registry can take some time, depending on your network connection.
In this example, we will deploy the image we created earlier on NVIDIA L40S (48 GB VRAM) GPU type. For larger models, you may need to choose one of the other GPU types we offer.
Create a new project or use existing one, open the project
On the left you'll see a navigation menu. Go to Containers -> New deployment. Name your deployment and select the L40S Compute Type.
Set Container Image to point to your repository where you pushed the image you created earlier. For example toghcr.io/username/cog-flux-schnell:v1
You can use the Public option for your image, if you pushed the image to a public repository. You can use the Private if you have a private registry, paired with credentials.
Make sure your preferred tag is selected
Set the Exposed HTTP port to 5000
Set the Healthcheck port to 5000
Set Health Check to /health-check
Make sure Start Command is off
Deploy container
(You can leave the Scaling options to their default values for now)
That's it! You have now created a deployment. You can check the logs of the deployment from the logs tab. This will take few minutes to complete.
For production use, we recommend authenticating/using private registries to avoid potential rate limits imposed by public container registries.
Before you can connect to the endpoint, you will need to generate an authentication token, by going to Keys -> Inference API Keys, and click Create.
The base endpoint URL for your deployment is in the Containers API section in the top left of the screen. This will be in the form of: https://containers.datacrunch.io/<NAME-OF-OUR-DEPLOYMENT>/
Once the deployment has been created and is ready to accept requests, you can test that it responds correctly by sending a /health-check
request to the endpoint. Below is an example cURL command for running your test deployment:
This should return a response that shows the deployment is available for use.
After /health-check
we are ready to send an inference requests to the model.
Navigate to your project directory and create a new virtual environment and run commands below:
You may also need to install some required pacakges,
In the same folder, create a new file named inference.py
and add the following code:
Run it with the following command:
The image you generated is located in the folder you ran the script in, named output.png
.
You will need a container registry to store the container image. You can use any container registry you prefer. In this example we use GitHub Container Registry. You can find more information about GitHub Container Registry from the .
to the DataCrunch cloud dashboard
This concludes our tutorial how create images from text using with black-forest-labs/FLUX.1-schnell
model. You can now use the endpoint to generate more images from text descriptions.