EmbeddingServer
EmbeddingServer defines a containerized embedding model server managed by the ToolHive operator. The VirtualMCPServer optimizer references an EmbeddingServer to generate vector embeddings for tool discovery.
API: toolhive.stacklok.dev/v1beta1
· Scope: Namespaced · Short names: emb, embedding
Example
apiVersion: toolhive.stacklok.dev/v1beta1
kind: EmbeddingServer
metadata:
name: my-embeddingserver
namespace: default
spec: {}
Schema
spec
EmbeddingServerSpec defines the desired state of EmbeddingServer
| Field | Type | Description |
|---|---|---|
args | string[] | Args are additional arguments to pass to the embedding inference server |
env | object[] | Env are environment variables to set in the container |
hfTokenSecretRef | object | HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. If provided, the secret value will be provided to the embedding server for authentication with huggingface. |
image | string | Image is the container image for the embedding inference server. Images must be from HuggingFace Text Embeddings Inference (https://github.com/huggingface/text-embeddings-inference). default "ghcr.io/huggingface/text-embeddings-inference:cpu-latest" |
imagePullPolicy | string | ImagePullPolicy defines the pull policy for the container image default "IfNotPresent" · enum: Always | Never | IfNotPresent |
model | string | Model is the HuggingFace embedding model to use (e.g., "sentence-transformers/all-MiniLM-L6-v2") default "BAAI/bge-small-en-v1.5" |
modelCache | object | ModelCache configures persistent storage for downloaded models When enabled, models are cached in a PVC and reused across pod restarts |
podTemplateSpec | object | PodTemplateSpec allows customizing the pod (node selection, tolerations, etc.) This field accepts a PodTemplateSpec object as JSON/YAML. Note that to modify the specific container the embedding server runs in, you must specify the 'embedding' container name in the PodTemplateSpec. |
port | integer | Port is the port to expose the embedding service on default 8080 · format int32 · min 1 · max 65535 |
replicas | integer | Replicas is the number of embedding server replicas to run default 1 · format int32 · min 1 |
resourceOverrides | object | ResourceOverrides allows overriding annotations and labels for resources created by the operator |
resources | object | Resources defines compute resources for the embedding server |
spec.env[]
Env are environment variables to set in the container
| Field | Type | Description |
|---|---|---|
namerequired | string | Name of the environment variable |
valuerequired | string | Value of the environment variable |
spec.hfTokenSecretRef
HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. If provided, the secret value will be provided to the embedding server for authentication with huggingface.
| Field | Type | Description |
|---|---|---|
keyrequired | string | Key is the key within the secret |
namerequired | string | Name is the name of the secret |
spec.modelCache
ModelCache configures persistent storage for downloaded models When enabled, models are cached in a PVC and reused across pod restarts
| Field | Type | Description |
|---|---|---|
accessMode | string | AccessMode is the access mode for the PVC default "ReadWriteOnce" · enum: ReadWriteOnce | ReadWriteMany | ReadOnlyMany |
enabled | boolean | Enabled controls whether model caching is enabled default true |
size | string | Size is the size of the PVC for model caching (e.g., "10Gi") default "10Gi" |
storageClassName | string | StorageClassName is the storage class to use for the PVC If not specified, uses the cluster's default storage class |
spec.resourceOverrides
ResourceOverrides allows overriding annotations and labels for resources created by the operator
| Field | Type | Description |
|---|---|---|
persistentVolumeClaim | object | PersistentVolumeClaim defines overrides for the PVC resource |
service | object | Service defines overrides for the Service resource |
statefulSet | object | StatefulSet defines overrides for the StatefulSet resource |
spec.resourceOverrides.persistentVolumeClaim
PersistentVolumeClaim defines overrides for the PVC resource
| Field | Type | Description |
|---|---|---|
annotations | map<string, string> | Annotations to add or override on the resource |
labels | map<string, string> | Labels to add or override on the resource |
spec.resourceOverrides.service
Service defines overrides for the Service resource
| Field | Type | Description |
|---|---|---|
annotations | map<string, string> | Annotations to add or override on the resource |
labels | map<string, string> | Labels to add or override on the resource |
spec.resourceOverrides.statefulSet
StatefulSet defines overrides for the StatefulSet resource
| Field | Type | Description |
|---|---|---|
annotations | map<string, string> | Annotations to add or override on the resource |
labels | map<string, string> | Labels to add or override on the resource |
podTemplateMetadataOverrides | object | PodTemplateMetadataOverrides defines metadata overrides for the pod template |
spec.resourceOverrides.statefulSet.podTemplateMetadataOverrides
PodTemplateMetadataOverrides defines metadata overrides for the pod template
| Field | Type | Description |
|---|---|---|
annotations | map<string, string> | Annotations to add or override on the resource |
labels | map<string, string> | Labels to add or override on the resource |
spec.resources
Resources defines compute resources for the embedding server
| Field | Type | Description |
|---|---|---|
limits | object | Limits describes the maximum amount of compute resources allowed |
requests | object | Requests describes the minimum amount of compute resources required |
spec.resources.limits
Limits describes the maximum amount of compute resources allowed
| Field | Type | Description |
|---|---|---|
cpu | string | CPU is the CPU limit in cores (e.g., "500m" for 0.5 cores) |
memory | string | Memory is the memory limit in bytes (e.g., "64Mi" for 64 megabytes) |
spec.resources.requests
Requests describes the minimum amount of compute resources required
| Field | Type | Description |
|---|---|---|
cpu | string | CPU is the CPU limit in cores (e.g., "500m" for 0.5 cores) |
memory | string | Memory is the memory limit in bytes (e.g., "64Mi" for 64 megabytes) |
status
EmbeddingServerStatus defines the observed state of EmbeddingServer
| Field | Type | Description |
|---|---|---|
conditions | object[] | Conditions represent the latest available observations of the EmbeddingServer's state |
message | string | Message provides additional information about the current phase |
observedGeneration | integer | ObservedGeneration reflects the generation most recently observed by the controller format int64 |
phase | string | Phase is the current phase of the EmbeddingServer enum: Pending | Downloading | Ready | Failed | Terminating |
readyReplicas | integer | ReadyReplicas is the number of ready replicas format int32 |
url | string | URL is the URL where the embedding service can be accessed |
status.conditions[]
Conditions represent the latest available observations of the EmbeddingServer's state
| Field | Type | Description |
|---|---|---|
lastTransitionTimerequired | string | lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. format date-time |
messagerequired | string | message is a human readable message indicating details about the transition. This may be an empty string. maxLength 32768 |
observedGeneration | integer | observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance. format int64 · min 0 |
reasonrequired | string | reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty. pattern ^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$ · minLength 1 · maxLength 1024 |
statusrequired | string | status of the condition, one of True, False, Unknown. enum: True | False | Unknown |
typerequired | string | type of condition in CamelCase or in foo.example.com/CamelCase. pattern ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$ · maxLength 316 |
Related resources
Referenced by:
- VirtualMCPServer - via
spec.embeddingServerRef