EmbeddingServer

EmbeddingServer defines a containerized embedding model server managed by the ToolHive operator. The VirtualMCPServer optimizer references an EmbeddingServer to generate vector embeddings for tool discovery.

API: toolhive.stacklok.dev/v1beta1 · Scope: Namespaced · Short names: emb, embedding

Example

embeddingserver.yaml
apiVersion: toolhive.stacklok.dev/v1beta1
kind: EmbeddingServer
metadata:
  name: my-embeddingserver
  namespace: default
spec: {}

Schema

`spec`

EmbeddingServerSpec defines the desired state of EmbeddingServer

Field	Type	Description
`args`	`string[]`	Args are additional arguments to pass to the embedding inference server
`env`	`object[]`	Env are environment variables to set in the container
`hfTokenSecretRef`	`object`	HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. If provided, the secret value will be provided to the embedding server for authentication with huggingface.
`image`	`string`	Image is the container image for the embedding inference server. Images must be from HuggingFace Text Embeddings Inference (https://github.com/huggingface/text-embeddings-inference). default `"ghcr.io/huggingface/text-embeddings-inference:cpu-latest"`
`imagePullPolicy`	`string`	ImagePullPolicy defines the pull policy for the container image default `"IfNotPresent"` · enum: `Always` \| `Never` \| `IfNotPresent`
`model`	`string`	Model is the HuggingFace embedding model to use (e.g., "sentence-transformers/all-MiniLM-L6-v2") default `"BAAI/bge-small-en-v1.5"`
`modelCache`	`object`	ModelCache configures persistent storage for downloaded models When enabled, models are cached in a PVC and reused across pod restarts
`podTemplateSpec`	`object`	PodTemplateSpec allows customizing the pod (node selection, tolerations, etc.) This field accepts a PodTemplateSpec object as JSON/YAML. Note that to modify the specific container the embedding server runs in, you must specify the 'embedding' container name in the PodTemplateSpec.
`port`	`integer`	Port is the port to expose the embedding service on default `8080` · format `int32` · min `1` · max `65535`
`replicas`	`integer`	Replicas is the number of embedding server replicas to run default `1` · format `int32` · min `1`
`resourceOverrides`	`object`	ResourceOverrides allows overriding annotations and labels for resources created by the operator
`resources`	`object`	Resources defines compute resources for the embedding server

`spec.env[]`

Env are environment variables to set in the container

Field	Type	Description
`name`required	`string`	Name of the environment variable
`value`required	`string`	Value of the environment variable

↑ Back to spec

`spec.hfTokenSecretRef`

HFTokenSecretRef is a reference to a Kubernetes Secret containing the huggingface token. If provided, the secret value will be provided to the embedding server for authentication with huggingface.

Field	Type	Description
`key`required	`string`	Key is the key within the secret
`name`required	`string`	Name is the name of the secret

↑ Back to spec

`spec.modelCache`

ModelCache configures persistent storage for downloaded models When enabled, models are cached in a PVC and reused across pod restarts

Field	Type	Description
`accessMode`	`string`	AccessMode is the access mode for the PVC default `"ReadWriteOnce"` · enum: `ReadWriteOnce` \| `ReadWriteMany` \| `ReadOnlyMany`
`enabled`	`boolean`	Enabled controls whether model caching is enabled default `true`
`size`	`string`	Size is the size of the PVC for model caching (e.g., "10Gi") default `"10Gi"`
`storageClassName`	`string`	StorageClassName is the storage class to use for the PVC If not specified, uses the cluster's default storage class

↑ Back to spec

`spec.resourceOverrides`

ResourceOverrides allows overriding annotations and labels for resources created by the operator

Field	Type	Description
`persistentVolumeClaim`	`object`	PersistentVolumeClaim defines overrides for the PVC resource
`service`	`object`	Service defines overrides for the Service resource
`statefulSet`	`object`	StatefulSet defines overrides for the StatefulSet resource

↑ Back to spec

`spec.resourceOverrides.persistentVolumeClaim`

PersistentVolumeClaim defines overrides for the PVC resource

Field	Type	Description
`annotations`	`map<string, string>`	Annotations to add or override on the resource
`labels`	`map<string, string>`	Labels to add or override on the resource

↑ Back to spec.resourceOverrides

`spec.resourceOverrides.service`

Service defines overrides for the Service resource

Field	Type	Description
`annotations`	`map<string, string>`	Annotations to add or override on the resource
`labels`	`map<string, string>`	Labels to add or override on the resource

↑ Back to spec.resourceOverrides

`spec.resourceOverrides.statefulSet`

StatefulSet defines overrides for the StatefulSet resource

Field	Type	Description
`annotations`	`map<string, string>`	Annotations to add or override on the resource
`labels`	`map<string, string>`	Labels to add or override on the resource
`podTemplateMetadataOverrides`	`object`	PodTemplateMetadataOverrides defines metadata overrides for the pod template

↑ Back to spec.resourceOverrides

`spec.resourceOverrides.statefulSet.podTemplateMetadataOverrides`

PodTemplateMetadataOverrides defines metadata overrides for the pod template

Field	Type	Description
`annotations`	`map<string, string>`	Annotations to add or override on the resource
`labels`	`map<string, string>`	Labels to add or override on the resource

↑ Back to spec.resourceOverrides.statefulSet

`spec.resources`

Resources defines compute resources for the embedding server

Field	Type	Description
`limits`	`object`	Limits describes the maximum amount of compute resources allowed
`requests`	`object`	Requests describes the minimum amount of compute resources required

↑ Back to spec

`spec.resources.limits`

Limits describes the maximum amount of compute resources allowed

Field	Type	Description
`cpu`	`string`	CPU is the CPU limit in cores (e.g., "500m" for 0.5 cores)
`memory`	`string`	Memory is the memory limit in bytes (e.g., "64Mi" for 64 megabytes)

↑ Back to spec.resources

`spec.resources.requests`

Requests describes the minimum amount of compute resources required

Field	Type	Description
`cpu`	`string`	CPU is the CPU limit in cores (e.g., "500m" for 0.5 cores)
`memory`	`string`	Memory is the memory limit in bytes (e.g., "64Mi" for 64 megabytes)

↑ Back to spec.resources

`status`

EmbeddingServerStatus defines the observed state of EmbeddingServer

Field	Type	Description
`conditions`	`object[]`	Conditions represent the latest available observations of the EmbeddingServer's state
`message`	`string`	Message provides additional information about the current phase
`observedGeneration`	`integer`	ObservedGeneration reflects the generation most recently observed by the controller format `int64`
`phase`	`string`	Phase is the current phase of the EmbeddingServer enum: `Pending` \| `Downloading` \| `Ready` \| `Failed` \| `Terminating`
`readyReplicas`	`integer`	ReadyReplicas is the number of ready replicas format `int32`
`url`	`string`	URL is the URL where the embedding service can be accessed

`status.conditions[]`

Conditions represent the latest available observations of the EmbeddingServer's state

Field	Type	Description
`lastTransitionTime`required	`string`	lastTransitionTime is the last time the condition transitioned from one status to another. This should be when the underlying condition changed. If that is not known, then using the time when the API field changed is acceptable. format `date-time`
`message`required	`string`	message is a human readable message indicating details about the transition. This may be an empty string. maxLength `32768`
`observedGeneration`	`integer`	observedGeneration represents the .metadata.generation that the condition was set based upon. For instance, if .metadata.generation is currently 12, but the .status.conditions[x].observedGeneration is 9, the condition is out of date with respect to the current state of the instance. format `int64` · min `0`
`reason`required	`string`	reason contains a programmatic identifier indicating the reason for the condition's last transition. Producers of specific condition types may define expected values and meanings for this field, and whether the values are considered a guaranteed API. The value should be a CamelCase string. This field may not be empty. pattern `^[A-Za-z]([A-Za-z0-9_,:]*[A-Za-z0-9_])?$` · minLength `1` · maxLength `1024`
`status`required	`string`	status of the condition, one of True, False, Unknown. enum: `True` \| `False` \| `Unknown`
`type`required	`string`	type of condition in CamelCase or in foo.example.com/CamelCase. pattern `^([a-z0-9]([-a-z0-9][a-z0-9])?(\.[a-z0-9]([-a-z0-9][a-z0-9])?)/)?(([A-Za-z0-9][-A-Za-z0-9_.])?[A-Za-z0-9])$` · maxLength `316`

↑ Back to status

Referenced by:

VirtualMCPServer - via spec.embeddingServerRef

Example​

Schema​

spec​

spec.env[]​

spec.hfTokenSecretRef​

spec.modelCache​

spec.resourceOverrides​

spec.resourceOverrides.persistentVolumeClaim​

spec.resourceOverrides.service​

spec.resourceOverrides.statefulSet​

spec.resourceOverrides.statefulSet.podTemplateMetadataOverrides​

spec.resources​

spec.resources.limits​

spec.resources.requests​

status​

status.conditions[]​

Related resources​

Example

Schema

`spec`

`spec.env[]`

`spec.hfTokenSecretRef`

`spec.modelCache`

`spec.resourceOverrides`

`spec.resourceOverrides.persistentVolumeClaim`

`spec.resourceOverrides.service`

`spec.resourceOverrides.statefulSet`

`spec.resourceOverrides.statefulSet.podTemplateMetadataOverrides`

`spec.resources`

`spec.resources.limits`

`spec.resources.requests`

`status`

`status.conditions[]`

Related resources