Skip to content

Adds custom inference service API docs #4852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

szabosteve
Copy link
Contributor

@szabosteve szabosteve commented Jul 9, 2025

Overview

Related issue: https://github.com/elastic/developer-docs-team/issues/307

This PR adds documentation about the custom inference service.

@jonathan-buttner Could you please provide an example request that I can add to the docs?

/**
* Specifies the JSON parser that is used to parse the response from the custom service.
* Different task types require different json_parser parameters.
* For example:
Copy link
Contributor Author

@szabosteve szabosteve Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-buttner Do you think we should specify a JsonParser class for each task type, or is this list sufficient?

}

export enum CustomServiceType {
custom
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-buttner Should the ServiceType be custom whenever it's specified for this service type? Or can it be anything, for example custom-model?

/**
* Create a custom inference endpoint.
*
* You can create an inference endpoint to perform an inference task with a custom model that supports the HTTP format.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonathan-buttner Please suggest an alternative description if you think this is not sufficient. I tried to come up with something that is meaningful to me based on my limited knowledge.

* The chunking configuration object.
* @ext_doc_id inference-chunking
*/
chunking_settings?: InferenceChunkingSettings
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are chunking settings relevant for this service?

Copy link
Contributor

github-actions bot commented Jul 9, 2025

Following you can find the validation changes against the target branch for the APIs.

No changes detected.

You can validate these APIs yourself by using the make validate target.

@jonathan-buttner
Copy link
Contributor

jonathan-buttner commented Jul 9, 2025

WIP (I'll update this comment with a bunch of examples).

Here are some examples:

OpenAI Text Embedding
PUT _inference/text_embedding/test
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.openai.com/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json;charset=utf-8"
        },
        "request": "{\"input\": ${input}, \"model\": \"text-embedding-3-small\"}",
        "response": {
            "json_parser": {
                "text_embeddings": "$.data[*].embedding[*]"
            }
        }
    }
}
Cohere APIv2 Rerank
PUT _inference/rerank/test-rerank
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/rerank",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"documents\": ${input}, \"query\": ${query}, \"model\": \"rerank-v3.5\"}",
        "response": {
            "json_parser": {
                "reranked_index":"$.results[*].index",
                "relevance_score":"$.results[*].relevance_score"
            }
        }
    }
}
Cohere APIv2 Text Embedding
PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "https://api.cohere.com/v2/embed",
        "headers": {
            "Authorization": "bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"texts\": ${input}, \"model\": \"embed-v4.0\", \"input_type\": ${input_type}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.embeddings.float[*]"
            }
        },
        "input_type": {
            "translation": {
                "ingest": "search_document",
                "search": "search_query"
            },
            "default": "search_document"
        }
    }
}
Jina AI Rerank
PUT _inference/rerank/jina
{
  "service": "custom",
  "service_settings": {
    "secret_parameters": {
      "api_key": "<api key>"
    },    
    "url": "https://api.jina.ai/v1/rerank",
    "headers": {
      "Content-Type": "application/json",
      "Authorization": "Bearer ${api_key}"
    },
    "request": "{\"model\": \"jina-reranker-v2-base-multilingual\",\"query\": ${query},\"documents\":${input}}",
    "response": {
      "json_parser": {
        "relevance_score": "$.results[*].relevance_score",
        "reranked_index": "$.results[*].index"
      }
    }
  }
}
Hugging Face Text Embedding for model Qwen/Qwen3-Embedding-8B (other will be very similar)
PUT _inference/text_embedding/test-text-embedding
{
    "service": "custom",
    "service_settings": {
        "secret_parameters": {
            "api_key": "<api key>"
        },
        "url": "<dedicated inference endpoint on HF>/v1/embeddings",
        "headers": {
            "Authorization": "Bearer ${api_key}",
            "Content-Type": "application/json"
        },
        "request": "{\"input\": ${input}}",
        "response": {
            "json_parser": {
                "text_embeddings":"$.data[*].embedding[*]"
            }
        }
    }
}

TODO

  • VoyageAI
  • Hugging Face Rerank
  • Google VertexAI
  • Azure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants