<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="rss.xsl"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>KServe Blog</title>
        <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog</link>
        <description>KServe Blog</description>
        <lastBuildDate>Tue, 27 May 2025 00:00:00 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <language>en</language>
        <item>
            <title><![CDATA[Announcing KServe v0.15 - Advancing Generative AI Model Serving]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release</guid>
            <pubDate>Tue, 27 May 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.15 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on May 27, 2025</em></p>
<p>We are thrilled to announce the release of <strong>KServe v0.15</strong>, marking a significant leap forward in serving both predictive and generative AI models. This release introduces enhanced support for generative AI workloads, including advanced features for serving large language models (LLMs), improved model and KV caching mechanisms, and integration with Envoy AI Gateway.</p>
<p><img decoding="async" loading="lazy" alt="!generative_inference" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/kserve_generative_inference-21648e7df404ea6f57b9d3c83e8e0ca4.png" width="911" height="581" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-embracing-generative-ai-workloads">🤖 Embracing Generative AI Workloads<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-embracing-generative-ai-workloads" class="hash-link" aria-label="Direct link to 🤖 Embracing Generative AI Workloads" title="Direct link to 🤖 Embracing Generative AI Workloads" translate="no">​</a></h2>
<p>KServe v0.15 brings first-class support for generative AI workloads, marking a key evolution beyond traditional predictive AI. Unlike predictive models that infer outcomes from existing data, generative models like large language models (LLMs) create new content from prompts. This fundamental difference introduces new serving challenges. KServe now provides the infrastructure and optimizations needed to serve these models efficiently at scale.</p>
<p>To support these workloads, we've introduced a dedicated <strong>Generative AI</strong> section in our documentation, detailing the new capabilities and configurations tailored for generative models.</p>
<p>KServe now offers a <strong>lightweight</strong> installation for hosting LLMs on Kubernetes, please follow <a href="https://kserve.github.io/archive/0.15/admin/kubernetes_deployment" target="_blank" rel="noopener noreferrer" class="">generative inference installation guide</a> to get started. KEDA is an optional component for scaling based on LLM specific metrics and Envoy AI gateway is integrated for advanced traffic management capabilities with token rate limiting, unified API and intelligent routing.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-key-generative-ai-features-in-v015">🚀 Key Generative AI Features in v0.15<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-key-generative-ai-features-in-v015" class="hash-link" aria-label="Direct link to 🚀 Key Generative AI Features in v0.15" title="Direct link to 🚀 Key Generative AI Features in v0.15" translate="no">​</a></h2>
<ul>
<li class=""><strong>Envoy AI Gateway Integration</strong></li>
<li class=""><strong>Multi Node Inference</strong></li>
<li class=""><strong>LLM Autoscaler with KEDA</strong></li>
<li class=""><strong>Distributed KV Cache with LMCache</strong></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-envoy-ai-gateway-support">🌐 Envoy AI Gateway Support<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-envoy-ai-gateway-support" class="hash-link" aria-label="Direct link to 🌐 Envoy AI Gateway Support" title="Direct link to 🌐 Envoy AI Gateway Support" translate="no">​</a></h3>
<p>KServe v0.15 adds initial support for <a href="https://aigateway.envoyproxy.io/" target="_blank" rel="noopener noreferrer" class=""><strong>Envoy AI Gateway</strong></a>, a CNCF open source project built on top of <a href="https://gateway.envoyproxy.io/" target="_blank" rel="noopener noreferrer" class="">Envoy Gateway</a> and designed specifically for managing generative AI traffic at scale.</p>
<p><a href="https://gateway.envoyproxy.io/" target="_blank" rel="noopener noreferrer" class="">Envoy Gateway</a> is also now supported in KServe along with <a href="https://gateway-api.sigs.k8s.io/" target="_blank" rel="noopener noreferrer" class="">Kubernetes Gateway API</a>. Unlike traditional gateway solutions, Envoy AI Gateway provides advanced capabilities tailored to AI serving, including:</p>
<ul>
<li class="">Dynamic model routing based on request content, model metadata, or user context.</li>
<li class="">Built-in support for multi-tenant inference, with fine-grained access controls and authentication.</li>
<li class="">Unified API for routing and managing LLM/AI traffic easily.</li>
<li class="">Integrated observability for model-level performance insights.</li>
<li class="">Extensibility for inference-specific policies like rate-limiting by token, and model lifecycle management.</li>
<li class="">Automatic failover mechanisms to ensure service reliability.</li>
</ul>
<p>This integration enables a unified, intelligent entrypoint for both predictive and generative workloads—scaling from traditional models to complex LLMs—all while abstracting infrastructure complexity from the user. Please refer to <a href="https://kserve.github.io/archive/0.15/admin/ai-gateway_integration" target="_blank" rel="noopener noreferrer" class="">Envoy AI Gateway integration doc</a> for more details.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-multi-node-inference">🔗 Multi-Node Inference<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-multi-node-inference" class="hash-link" aria-label="Direct link to 🔗 Multi-Node Inference" title="Direct link to 🔗 Multi-Node Inference" translate="no">​</a></h3>
<p>To support LLMs too large for a single node (e.g., Llama 3.1 405B), KServe v0.15 introduces multi-node inference across distributed GPUs, unlocking large model serving at scale. As models continue to increase in size, multi-node inference capabilities are increasingly important for production deployments that require real-time user experience. Please refer to the <a href="https://kserve.github.io/archive/0.15/modelserving/v1beta1/llm/huggingface/multi-node" target="_blank" rel="noopener noreferrer" class="">Multi Node inference doc</a> for more details.</p>
<p>The community is also working on a <a href="https://github.com/kserve/kserve/issues/4433" target="_blank" rel="noopener noreferrer" class="">new distributed inference API</a> to allow scaling Multi Node Inference and support Disaggregatd Prefilling which is targeted for large LLM deployments.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storageUri</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> pvc</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">pvc/hf/8b_instruction_tuned</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">workerSpec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">pipelineParallelSize</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">tensorParallelSize</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-llm-autoscaler-with-keda-kubernetes-event-driven-autoscaling">⚡ LLM Autoscaler with KEDA <a href="https://keda.sh/" target="_blank" rel="noopener noreferrer" class="">(Kubernetes Event-driven Autoscaling)</a><a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-llm-autoscaler-with-keda-kubernetes-event-driven-autoscaling" class="hash-link" aria-label="Direct link to -llm-autoscaler-with-keda-kubernetes-event-driven-autoscaling" title="Direct link to -llm-autoscaler-with-keda-kubernetes-event-driven-autoscaling" translate="no">​</a></h3>
<p>Autoscaling LLMs is challenging due to their high resource demands and variable inference traffic patterns. The dynamic nature of LLM inference, with varying input lengths and token generation speeds, further complicates the prediction of resource needs, demanding sophisticated and adaptive autoscaling solutions. KServe now integrates with <a href="https://keda.sh/" target="_blank" rel="noopener noreferrer" class=""><strong>KEDA</strong></a> (Kubernetes Event-Driven Autoscaling) offers a powerful solution to many of the challenges associated with LLM autoscaling by extending Kubernetes' native Horizontal Pod Autoscaler (HPA) capabilities. KEDA can monitor custom metrics which means you can expose LLM metrics from your LLM inference servers and use KEDA to scale based on these precise indicators.</p>
<p>This empowers users to efficiently manage LLM workloads with more intelligent scaling decisions based on workload characteristics for improved performance and cost optimization. Please follow the <a href="https://kserve.github.io/archive/0.15/modelserving/autoscaling/keda/autoscaling_llm" target="_blank" rel="noopener noreferrer" class="">tutorial doc</a> for how to autoscale based on vLLM metrics.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">keda</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">annotations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">serving.kserve.io/autoscalerClass</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"keda"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">sidecar.opentelemetry.io/inject</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"huggingface-llama3-keda"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_name=llama3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_id=meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">70b</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">minReplicas</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">maxReplicas</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">5</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">autoScaling</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">metrics</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> PodMetric</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">podmetric</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">metric</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">backend</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"opentelemetry"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">metricNames</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> vllm</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">num_requests_running</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">query</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"vllm:num_requests_running"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">target</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">type</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Value</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">              </span><span class="token key atrule" style="color:#00a4db">value</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"4"</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-distributed-kv-cache-with-lmcache">🚀 Distributed KV Cache with <a href="https://lmcache.ai/" target="_blank" rel="noopener noreferrer" class="">LMCache</a><a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-distributed-kv-cache-with-lmcache" class="hash-link" aria-label="Direct link to -distributed-kv-cache-with-lmcache" title="Direct link to -distributed-kv-cache-with-lmcache" translate="no">​</a></h3>
<p>Key-Value (KV) cache offloading is a technique used in large language model (LLM) serving to store and reuse the intermediate key and value tensors generated during model inference. In transformer-based models, these KV caches represent the context for each token processed, and reusing them allows the model to avoid redundant computations for repeated or similar prompts.</p>
<p>Enabling KV cache offloading across multiple requests and serving instances can achieve reduced Time To First Token(TTFT), improve scalability for shared cache across replicas, and improve user experience for multi-turn QA or RAG.</p>
<p>KServe integrates <a href="https://lmcache.ai/" target="_blank" rel="noopener noreferrer" class="">LMCache</a>, the-state-of-art KV cache layer library developed by LMCache Lab to reduce inference costs and ensure SLOs for both latency and throughput at scale. Please follow the <a href="https://kserve.github.io/archive/0.15/modelserving/v1beta1/llm/huggingface/kv_cache_offloading/#overview" target="_blank" rel="noopener noreferrer" class="">LMCache integration doc</a> to optimize your GenAI inference workload.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">lmcache</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">minReplicas</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">2</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_name=llama3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_id=meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">70b</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">kv</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">transfer</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">config</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">'{"kv_connector":"LMCacheConnectorV1", "kv_role":"kv_both"}'</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">enable</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">chunked</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">prefill</span><br></span></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-advanced-model-caching-mechanisms">📦 Advanced Model Caching Mechanisms<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-advanced-model-caching-mechanisms" class="hash-link" aria-label="Direct link to 📦 Advanced Model Caching Mechanisms" title="Direct link to 📦 Advanced Model Caching Mechanisms" translate="no">​</a></h3>
<p>To reduce model loading times and improve overall efficiency of serving large models, KServe v0.15 introduces advanced model caching features:</p>
<ul>
<li class=""><strong>LocalModelCache Enhancements:</strong> Improved the LocalModelCache custom resource to support multiple node groups, providing greater flexibility in model placement and caching strategies.</li>
<li class=""><strong>Node Agent Improvements:</strong> Enhanced the local model node agent for better performance and reliability.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-enhanced-vllm-backend-support">🔧 Enhanced vLLM Backend Support<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-enhanced-vllm-backend-support" class="hash-link" aria-label="Direct link to 🔧 Enhanced vLLM Backend Support" title="Direct link to 🔧 Enhanced vLLM Backend Support" translate="no">​</a></h3>
<p>The vLLM backend has been significantly upgraded to better serve generative AI models:</p>
<ul>
<li class=""><strong>Version Upgrade:</strong> Updated to vLLM 0.8.5, bringing performance improvements with v1 backend and new features.</li>
<li class=""><strong>Qwen3 &amp; Llama4:</strong> Added support for Qwen3 and Llama4 models.</li>
<li class=""><strong>Reranking Support:</strong> Added support for reranking models.</li>
<li class=""><strong>Embedding Support:</strong> Added support for OpenAI-compatible embeddings API, enabling a broader range of applications.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-additional-improvements">🛠️ Additional Improvements<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#%EF%B8%8F-additional-improvements" class="hash-link" aria-label="Direct link to 🛠️ Additional Improvements" title="Direct link to 🛠️ Additional Improvements" translate="no">​</a></h2>
<p>This release also includes several other enhancements:</p>
<ul>
<li class="">Support Deep Health Checks <a href="https://github.com/kserve/kserve/pull/3348" target="_blank" rel="noopener noreferrer" class="">#3348</a></li>
<li class="">Collocated Transformer &amp; Predictor Feature <a href="https://github.com/kserve/kserve/pull/4255" target="_blank" rel="noopener noreferrer" class="">#4255</a></li>
<li class="">Kubernetes Gateway API support <a href="https://github.com/kserve/kserve/pull/3952" target="_blank" rel="noopener noreferrer" class="">#3952</a></li>
<li class="">Security Updates</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.15.0" target="_blank" rel="noopener noreferrer" class="">GitHub release page</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We extend our gratitude to all the contributors who made this release possible. Your efforts continue to drive the advancement of KServe as a leading platform for serving machine learning models.</p>
<ul>
<li class=""><strong>Core Contributors</strong>: The KServe maintainers and regular as well as new contributors</li>
<li class=""><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</li>
<li class=""><strong>Special Recognition</strong>: The generative AI community for their valuable input on LLM serving requirements</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.15-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<p>We invite you to explore the new features in KServe v0.15 and contribute to the ongoing development of the project:</p>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues" target="_blank" rel="noopener noreferrer" class="">#kserve</a>)</li>
<li class="">Attend our community meeting by subscribing to the <a href="https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month" target="_blank" rel="noopener noreferrer" class="">KServe calendar</a>.</li>
<li class="">View our <a href="https://github.com/kserve/community" target="_blank" rel="noopener noreferrer" class="">community github repository</a> to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<p><strong>Happy serving!</strong></p>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[Announcing KServe v0.14]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release</guid>
            <pubDate>Fri, 13 Dec 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.14 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on December 23, 2024</em></p>
<p>We are excited to announce KServe v0.14. In this release we are introducing a new Python client designed for KServe, and a new model cache feature; we are promoting OCI storage for models as a stable feature; and we added support for deploying models directly from Hugging Face.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-key-features">🚀 Key Features<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#-key-features" class="hash-link" aria-label="Direct link to 🚀 Key Features" title="Direct link to 🚀 Key Features" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="introducing-inference-client-for-python">Introducing Inference client for Python<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#introducing-inference-client-for-python" class="hash-link" aria-label="Direct link to Introducing Inference client for Python" title="Direct link to Introducing Inference client for Python" translate="no">​</a></h3>
<p>The KServe Python SDK now includes both <a href="https://github.com/kserve/kserve/blob/v0.14.0/python/kserve/kserve/inference_client.py#L388" target="_blank" rel="noopener noreferrer" class="">REST</a> and <a href="https://github.com/kserve/kserve/blob/v0.14.0/python/kserve/kserve/inference_client.py#L61" target="_blank" rel="noopener noreferrer" class="">GRPC</a> inference clients. The new Inference clients of the SDK were delivered as <strong>alpha</strong> features.</p>
<p>Inline with the features documented in issue <a href="https://github.com/kserve/kserve/issues/3270" target="_blank" rel="noopener noreferrer" class="">#3270</a>, both clients have the following characteristics:</p>
<ul>
<li class="">The clients are asynchronous</li>
<li class="">Support for HTTP/2 (via <a href="https://www.python-httpx.org/" target="_blank" rel="noopener noreferrer" class="">httpx</a> library)</li>
<li class="">Support Open Inference Protocol v1 and v2</li>
<li class="">Allow client send and receive tensor data in binary format for HTTP/REST request, see <a href="https://kserve.github.io/archive/0.14/modelserving/data_plane/binary_tensor_data_extension/" target="_blank" rel="noopener noreferrer" class="">binary tensor data extension docs</a>.</li>
</ul>
<p>As usual, the version 0.14.0 of the KServe Python SDK is <a href="https://pypi.org/project/kserve/0.14.0/" target="_blank" rel="noopener noreferrer" class="">published to PyPI</a> and available to install via <code>pip install</code>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="support-for-oci-storage-for-models-modelcars-becomes-stable">Support for OCI storage for models (modelcars) becomes stable<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#support-for-oci-storage-for-models-modelcars-becomes-stable" class="hash-link" aria-label="Direct link to Support for OCI storage for models (modelcars) becomes stable" title="Direct link to Support for OCI storage for models (modelcars) becomes stable" translate="no">​</a></h3>
<p>In KServe version 0.12, support for using OCI containers for model storage was introduced as an experimental feature. This allows users to store models in containers in OCI format, and allows the usage of OCI-compatible registries for publishing the models.</p>
<p>This feature was implemented by configuring the OCI model container as a sidecar in the InferenceService pod, which was the motivation for naming the feature as modelcars. The model files are made available to the model server by configuring <a href="https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/" target="_blank" rel="noopener noreferrer" class="">process namespace sharing</a> in the pod.</p>
<p>There was one small but important detail that was unsolved and motivated the experimental status: since the modelcar is part of the main containers of the pod, there was no certainty that the modelcar would start quickly. The model server would be unstable if it starts first than the modelcar, and since there was no prefetching of the model image, this was thought as a likely condition.</p>
<p>The unstable situation has been mitigated by configuring the OCI model as an init container in addition to also configuring it as a sidecar. The configuration as an init container ensures that the model is fetched before the main containers are started. The prefetching allows the modelcar to start quickly.
The stabilization is available since KServe version 0.14, where modelcars are now a stable feature.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="future-plan">Future plan<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#future-plan" class="hash-link" aria-label="Direct link to Future plan" title="Direct link to Future plan" translate="no">​</a></h4>
<p>Modelcars is one implementation option for supporting OCI images for model storage. There are other alternatives commented in <a href="https://github.com/kserve/kserve/issues/4083" target="_blank" rel="noopener noreferrer" class="">issue #4083</a>.</p>
<p>Using volume mounts based on OCI artifacts is the optimal implementation, but this is only <a href="https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/" target="_blank" rel="noopener noreferrer" class="">recently possible since Kubernetes 1.31</a> as a native alpha feature. KServe can now evolve to use this new Kubernetes feature.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="introducing-model-cache">Introducing Model Cache<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#introducing-model-cache" class="hash-link" aria-label="Direct link to Introducing Model Cache" title="Direct link to Introducing Model Cache" translate="no">​</a></h3>
<p>With models increasing in size, specially true for LLM models, pulling from storage each time a pod is created can result in unmanageable start-up times. Although OCI storage also has the benefit of model caching, the capabilities are not flexible since the management is delegated to the cluster.</p>
<p>The Model Cache was proposed as another alternative to enhance KServe usability with big models, released in KServe v0.14 as an <strong>alpha</strong> feature.
In this release local node storage is used for storing models and <code>LocalModelCache</code> custom resource provides the control about which models to store in the cache.
The local model cache state can always be rebuilt from the models stored on persistent storage like model registry or S3.
Read the <a href="https://docs.google.com/document/d/1nao8Ws3tonO2zNAzdmXTYa0hECZNoP2SV_z9Zg0PzLA/edit" target="_blank" rel="noopener noreferrer" class="">design document for the details</a>.</p>
<p><img decoding="async" loading="lazy" alt="!localmodelcache" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/localmodelcache-59f819fe261fb8fcd66a6c875a73b3d6.png" width="2462" height="1416" class="img_ev3q"></p>
<p>By caching the models, you get the following benefits:</p>
<ul>
<li class="">
<p>Minimize the time it takes for LLM pods to start serving requests.</p>
</li>
<li class="">
<p>Sharing the same storage for pods scheduled on the same GPU node.</p>
</li>
<li class="">
<p>Model Cache allows scaling your AI workload efficiently without worrying about the slow model server container startup.</p>
</li>
</ul>
<p>The model cache is currently disabled by default. To enable, you need to modify the <code>localmodel.enabled</code> field on the <code>inferenceservice-config</code> ConfigMap.</p>
<p>You can follow <a href="https://kserve.github.io/archive/0.14/modelserving/storage/modelcache/localmodel/" target="_blank" rel="noopener noreferrer" class="">local model cache tutorial</a> to cache LLMs on local NVMe of your GPU nodes and deploy LLMs with <code>InferenceService</code> by loading models from local cache to accelerate the container startup.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="support-for-hugging-face-hub-in-storage-initializer">Support for Hugging Face hub in storage initializer<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#support-for-hugging-face-hub-in-storage-initializer" class="hash-link" aria-label="Direct link to Support for Hugging Face hub in storage initializer" title="Direct link to Support for Hugging Face hub in storage initializer" translate="no">​</a></h3>
<p>The KServe storage initializer has been enhanced to support downloading models directly from Hugging Face. For this, the new schema <code>hf://</code> is now supported in the <code>storageUri</code> field of InferenceServices. The following YAML partial shows this:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storageUri</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> hf</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span></code></pre></div></div>
<p>Both public and private Hugging Face repositories are supported. The credentials can be provided by the usual mechanism of binding Secrets to ServiceAccounts, or by binding the credentials Secret as environment variables in the InferenceService.</p>
<p>Read the <a href="https://kserve.github.io/archive/0.14/modelserving/storage/huggingface/hf/" target="_blank" rel="noopener noreferrer" class="">documentation</a> for more details.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-enhancements-and-improvements">🛠️ Enhancements and Improvements<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#%EF%B8%8F-enhancements-and-improvements" class="hash-link" aria-label="Direct link to 🛠️ Enhancements and Improvements" title="Direct link to 🛠️ Enhancements and Improvements" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="hugging-face-vllm-backend-changes">Hugging Face vLLM backend changes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#hugging-face-vllm-backend-changes" class="hash-link" aria-label="Direct link to Hugging Face vLLM backend changes" title="Direct link to Hugging Face vLLM backend changes" translate="no">​</a></h3>
<ul>
<li class="">vLLM backend to update to 0.6.1 <a href="https://github.com/kserve/kserve/pull/3948" target="_blank" rel="noopener noreferrer" class="">#3948</a></li>
<li class="">Support trust_remote_code flag for vllm <a href="https://github.com/kserve/kserve/pull/3729" target="_blank" rel="noopener noreferrer" class="">#3729</a></li>
<li class="">Support text embedding task in hugging face server <a href="https://github.com/kserve/kserve/pull/3743" target="_blank" rel="noopener noreferrer" class="">#3743</a></li>
<li class="">Add health endpoint for vLLM backend <a href="https://github.com/kserve/kserve/pull/3850" target="_blank" rel="noopener noreferrer" class="">#3850</a></li>
<li class="">Added <code>hostIPC</code> field to <code>ServingRuntime</code> CRD, for supporting more than one GPU in Serverless mode <a href="https://github.com/kserve/kserve/issues/3791" target="_blank" rel="noopener noreferrer" class="">#3791</a></li>
<li class="">Support shared memory volume for vLLM backend <a href="https://github.com/kserve/kserve/pull/3910" target="_blank" rel="noopener noreferrer" class="">#3910</a></li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="other-enhancements">Other Enhancements<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#other-enhancements" class="hash-link" aria-label="Direct link to Other Enhancements" title="Direct link to Other Enhancements" translate="no">​</a></h3>
<ul>
<li class="">New flag for automount serviceaccount token by <a href="https://github.com/kserve/kserve/pull/3979" target="_blank" rel="noopener noreferrer" class="">#3979</a></li>
<li class="">TLS support for inference loggers <a href="https://github.com/kserve/kserve/issues/3837" target="_blank" rel="noopener noreferrer" class="">#3837</a></li>
<li class="">Allow PVC storage to be mounted in ReadWrite mode via an annotation <a href="https://github.com/kserve/kserve/issues/3687" target="_blank" rel="noopener noreferrer" class="">#3687</a></li>
<li class="">Support HTTP Headers passing for KServe python custom runtimes <a href="https://github.com/kserve/kserve/pull/3669" target="_blank" rel="noopener noreferrer" class="">#3669</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-whats-changed">⚠️ What's Changed<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#%EF%B8%8F-whats-changed" class="hash-link" aria-label="Direct link to ⚠️ What's Changed" title="Direct link to ⚠️ What's Changed" translate="no">​</a></h2>
<ul>
<li class="">Ray is now an optional dependency <a href="https://github.com/kserve/kserve/pull/3834" target="_blank" rel="noopener noreferrer" class="">#3834</a></li>
<li class="">Support for Python 3.12 is added, while support Python 3.8 is removed <a href="https://github.com/kserve/kserve/pull/3645" target="_blank" rel="noopener noreferrer" class="">#3645</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.14.0" target="_blank" rel="noopener noreferrer" class="">GitHub release page</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<ul>
<li class=""><strong>Core Contributors</strong>: The KServe maintainers and regular as well as new contributors</li>
<li class=""><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.14-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the community" title="Direct link to 🤝 Join the community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues" target="_blank" rel="noopener noreferrer" class="">#kserve</a>)</li>
<li class="">Attend our community meeting by subscribing to the <a href="https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month" target="_blank" rel="noopener noreferrer" class="">KServe calendar</a>.</li>
<li class="">View our <a href="https://github.com/kserve/community" target="_blank" rel="noopener noreferrer" class="">community github repository</a> to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[From Serverless Predictive Inference to Generative Inference - Introducing KServe v0.13]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release</guid>
            <pubDate>Wed, 15 May 2024 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.13 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on May 15, 2024</em></p>
<p>We are excited to unveil KServe v0.13, marking a significant leap forward in evolving cloud native model serving to meet the demands of Generative AI inference. This release is highlighted by three pivotal updates: enhanced Hugging Face runtime, robust vLLM backend support for Generative Models, and the integration of OpenAI protocol standards.</p>
<p><img decoding="async" loading="lazy" alt="!kserve-components" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/kserve-layer-08feccc0300cf8608f0a36b6572e70fb.png" width="960" height="540" class="img_ev3q"></p>
<p>Below are a summary of the key changes.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-enhanced-hugging-face-runtime-support">🚀 Enhanced Hugging Face Runtime Support<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-enhanced-hugging-face-runtime-support" class="hash-link" aria-label="Direct link to 🚀 Enhanced Hugging Face Runtime Support" title="Direct link to 🚀 Enhanced Hugging Face Runtime Support" translate="no">​</a></h2>
<p>KServe v0.13 enriches its Hugging Face runtime and now supports running Hugging Face models out-of-the-box. KServe v0.13 implements a <a href="https://github.com/kserve/kserve/tree/release-0.13/python/huggingfaceserver" target="_blank" rel="noopener noreferrer" class="">KServe Hugging Face Serving Runtime</a>, <code>kserve-huggingfaceserver</code>. With this implementation, KServe can now automatically infer a <a href="https://huggingface.co/tasks" target="_blank" rel="noopener noreferrer" class="">task</a> from model architecture and select the optimized serving runtime. Currently supported tasks include sequence classification, token classification, fill mask, text generation, and text to text generation.</p>
<p><img decoding="async" loading="lazy" alt="!kserve-huggingface" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/kserve-huggingface-209566d5f98a98d521606e57b4531a19.png" width="7243" height="2208" class="img_ev3q"></p>
<p>Here is an example to serve BERT model by deploying an Inference Service with Hugging Face runtime for classification task.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">bert</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_name=bert</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_id=bert</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">base</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">uncased</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tensor_input_names=input_ids</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">limits</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 2Gi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">nvidia.com/gpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 2Gi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">nvidia.com/gpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><br></span></code></pre></div></div>
<p>You can also deploy BERT on the more optimized inference runtime like Triton using Hugging Face Runtime for pre/post processing, see more details <a href="https://kserve.github.io/archive/0.13/modelserving/v1beta1/triton/huggingface/" target="_blank" rel="noopener noreferrer" class="">here</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-vllm-support">🔧 vLLM Support<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-vllm-support" class="hash-link" aria-label="Direct link to 🔧 vLLM Support" title="Direct link to 🔧 vLLM Support" translate="no">​</a></h3>
<p>Version 0.13 introduces dedicated runtime support for <a href="https://docs.vllm.ai/en/latest/" target="_blank" rel="noopener noreferrer" class="">vLLM</a>, for enhanced transformer model serving. This support now includes auto-mapping vLLMs as the backend for supported tasks, streamlining the deployment process and optimizing performance. If vLLM does not support a particular task, it will default to the Hugging Face backend. See example below.</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> huggingface</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_name=llama3</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_id=meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama/meta</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">llama</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">3</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">8b</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">instruct</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">limits</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"6"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 24Gi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">nvidia.com/gpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"6"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 24Gi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">nvidia.com/gpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><br></span></code></pre></div></div>
<p>See more details in our updated docs to <a href="https://kserve.github.io/archive/0.13/modelserving/v1beta1/llm/huggingface/" target="_blank" rel="noopener noreferrer" class="">Deploy the Llama3 model with Hugging Face LLM Serving Runtime</a>.</p>
<p>Additionally, if the Hugging Face backend is preferred over vLLM, vLLM auto-mapping can be disabled with the <code>--backend=huggingface</code> arg.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-openai-schema-integration">🌐 OpenAI Schema Integration<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-openai-schema-integration" class="hash-link" aria-label="Direct link to 🌐 OpenAI Schema Integration" title="Direct link to 🌐 OpenAI Schema Integration" translate="no">​</a></h3>
<p>Embracing the OpenAI protocol, KServe v0.13 now supports three specific endpoints for generative transformer models:</p>
<ul>
<li class=""><code>/openai/v1/completions</code></li>
<li class=""><code>/openai/v1/chat/completions</code></li>
<li class=""><code>/openai/v1/models</code></li>
</ul>
<p>These endpoints are useful for generative transformer models, which take in messages and return a model-generated message output. The <a href="https://platform.openai.com/docs/guides/text-generation/chat-completions-api" target="_blank" rel="noopener noreferrer" class="">chat completions endpoint</a> is designed for easily handling multi-turn conversations, while still being useful for single-turn tasks. The <a href="https://platform.openai.com/docs/guides/text-generation/completions-api" target="_blank" rel="noopener noreferrer" class="">completions endpoint</a> is now a legacy endpoint that differs with the chat completions endpoint in that the interface for completions is a freeform text string called a <code>prompt</code>. Read more about the <a href="https://platform.openai.com/docs/api-reference/chat" target="_blank" rel="noopener noreferrer" class="">chat completions</a> and <a href="https://platform.openai.com/docs/api-reference/completions" target="_blank" rel="noopener noreferrer" class="">completions</a> endpoints in the OpenAI API docs.</p>
<p>This update fosters a standardized approach to transformer model serving, ensuring compatibility with a broader spectrum of models and tools, and enhances the platform's versatility. The API can be directly used with OpenAI's client libraries or third-party tools, like LangChain or LlamaIndex.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-future-plan">🔮 Future Plan<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-future-plan" class="hash-link" aria-label="Direct link to 🔮 Future Plan" title="Direct link to 🔮 Future Plan" translate="no">​</a></h3>
<ul>
<li class="">Support other tasks like text embeddings <a href="https://github.com/kserve/kserve/issues/3572" target="_blank" rel="noopener noreferrer" class="">#3572</a>.</li>
<li class="">Support more LLM backend options in the future, such as TensorRT-LLM.</li>
<li class="">Enrich text generation metrics for Throughput(tokens/sec), TTFT(Time to first token) <a href="https://github.com/kserve/kserve/issues/3461" target="_blank" rel="noopener noreferrer" class="">#3461</a>.</li>
<li class="">KEDA integration for token based LLM Autoscaling <a href="https://github.com/kserve/kserve/issues/3561" target="_blank" rel="noopener noreferrer" class="">#3561</a>.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-other-changes">🛠️ Other Changes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#%EF%B8%8F-other-changes" class="hash-link" aria-label="Direct link to 🛠️ Other Changes" title="Direct link to 🛠️ Other Changes" translate="no">​</a></h2>
<p>This release also includes several enhancements and changes:</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-whats-new">✨ What's New?<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-whats-new" class="hash-link" aria-label="Direct link to ✨ What's New?" title="Direct link to ✨ What's New?" translate="no">​</a></h3>
<ul>
<li class="">Async streaming support for v1 endpoints <a href="https://github.com/kserve/kserve/issues/3402" target="_blank" rel="noopener noreferrer" class="">#3402</a>.</li>
<li class="">Support for <code>.json</code> and <code>.ubj</code> model formats in the XGBoost server image <a href="https://github.com/kserve/kserve/issues/3546" target="_blank" rel="noopener noreferrer" class="">#3546</a>.</li>
<li class="">Enhanced flexibility in KServe by allowing the configuration of multiple domains for an inference service <a href="https://github.com/kserve/kserve/issues/2747" target="_blank" rel="noopener noreferrer" class="">#2747</a>.</li>
<li class="">Enhanced the manager setup to dynamically adapt based on available CRDs, improving operational flexibility and reliability across different deployment environments <a href="https://github.com/kserve/kserve/issues/3470" target="_blank" rel="noopener noreferrer" class="">#3470</a>.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-whats-changed">⚠️ What's Changed?<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#%EF%B8%8F-whats-changed" class="hash-link" aria-label="Direct link to ⚠️ What's Changed?" title="Direct link to ⚠️ What's Changed?" translate="no">​</a></h3>
<ul>
<li class="">Removed Seldon Alibi dependency <a href="https://github.com/kserve/kserve/issues/3380" target="_blank" rel="noopener noreferrer" class="">#3380</a>.</li>
<li class="">Removal of conversion webhook from manifests. <a href="https://github.com/kserve/kserve/issues/3344" target="_blank" rel="noopener noreferrer" class="">#3344</a>.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.13.0" target="_blank" rel="noopener noreferrer" class="">GitHub release page</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<ul>
<li class=""><strong>Core Contributors</strong>: The KServe maintainers and regular as well as new contributors</li>
<li class=""><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</li>
<li class=""><strong>Special Recognition</strong>: Contributors who helped drive the generative AI capabilities forward</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.13-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues" target="_blank" rel="noopener noreferrer" class="">#kserve</a>)</li>
<li class="">Attend our community meeting by subscribing to the <a href="https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month" target="_blank" rel="noopener noreferrer" class="">KServe calendar</a>.</li>
<li class="">View our <a href="https://github.com/kserve/community" target="_blank" rel="noopener noreferrer" class="">community github repository</a> to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[Announcing KServe v0.11]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release</guid>
            <pubDate>Sun, 08 Oct 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.11 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on October 8, 2023</em></p>
<p>We are excited to announce the release of KServe 0.11. In this release we introduced Large Language Model (LLM) runtimes, made enhancements to the KServe control plane, Python SDK Open Inference Protocol support and dependency management. For ModelMesh we have added features PVC, HPA, payload logging to ensure feature parity with KServe.</p>
<p>Here is a summary of the key changes:</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-kserve-core-inference-enhancements">🚀 KServe Core Inference Enhancements<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-kserve-core-inference-enhancements" class="hash-link" aria-label="Direct link to 🚀 KServe Core Inference Enhancements" title="Direct link to 🚀 KServe Core Inference Enhancements" translate="no">​</a></h2>
<ul>
<li class="">
<p><strong>Path-based routing support</strong> which is served as an alternative way to the host based routing, the URL of the <code>InferenceService</code> could look like <code>http://&lt;ingress_domain&gt;/serving/&lt;namespace&gt;/&lt;isvc_name&gt;</code>.
Please refer to the <a href="https://github.com/kserve/kserve/blob/294a10495b6b5cda9c64d3e1573b60aec62aceb9/config/configmap/inferenceservice.yaml#L237" target="_blank" rel="noopener noreferrer" class="">doc</a> for how to enable path based routing.</p>
</li>
<li class="">
<p><strong>Priority field for Serving Runtime</strong> custom resource to handle the case when you have multiple serving runtimes which support the same model formats, see more details from <a href="https://kserve.github.io/archive/0.11/modelserving/servingruntimes/#priority" target="_blank" rel="noopener noreferrer" class="">the serving runtime doc</a>.</p>
</li>
<li class="">
<p><strong>Custom Storage Container CRD</strong> to allow customized implementations with supported storage URI prefixes, example use cases are private model registry integration:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"serving.kserve.io/v1alpha1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ClusterStorageContainer</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> default</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">container</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> storage</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">initializer</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kserve/model</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">registry</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">latest</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100Mi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">limits</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 1Gi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">supportedUriFormats</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">prefix</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> model</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">registry</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//</span><br></span></code></pre></div></div>
</li>
<li class="">
<p><strong>Inference Graph enhancements</strong> for improving the API spec to support pod affinity and resource requirement fields.
<code>Dependency</code> field with options <code>Soft</code> and <code>Hard</code> is introduced to handle error responses from the inference steps to decide whether to short-circuit the request in case of errors, see the following example with hard dependency with the node steps:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceGraph</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> graph_with_switch_node</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">nodes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">root</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">routerType</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Sequence</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">steps</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"rootStep1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">nodeName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> node1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">dependency</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Hard</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"rootStep2"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">serviceName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> success_200_isvc_id </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">node1</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">routerType</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Switch</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">steps</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"node1Step1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">serviceName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> error_404_isvc_id </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">condition</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"[@this].#(decision_picker==ERROR)"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">            </span><span class="token key atrule" style="color:#00a4db">dependency</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Hard</span><br></span></code></pre></div></div>
<p>For more details please refer to the <a href="https://github.com/kserve/kserve/issues/2484" target="_blank" rel="noopener noreferrer" class="">issue</a>.</p>
</li>
<li class="">
<p><strong>Improved InferenceService debugging experience</strong> by adding the aggregated <code>RoutesReady</code> status and <code>LastDeploymentReady</code> condition to the InferenceService Status to differentiate the endpoint and deployment status.
This applies to the serverless mode and for more details refer to the <a href="https://pkg.go.dev/github.com/kserve/kserve@v0.11.1/pkg/apis/serving/v1beta1#InferenceServiceStatus" target="_blank" rel="noopener noreferrer" class="">API docs</a>.</p>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-enhanced-python-sdk-dependency-management">📦 Enhanced Python SDK Dependency Management<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-enhanced-python-sdk-dependency-management" class="hash-link" aria-label="Direct link to 📦 Enhanced Python SDK Dependency Management" title="Direct link to 📦 Enhanced Python SDK Dependency Management" translate="no">​</a></h3>
<ul>
<li class="">
<p>KServe has adopted <a href="https://python-poetry.org/docs/" target="_blank" rel="noopener noreferrer" class="">poetry</a> to manage python dependencies. You can now install the KServe SDK with locked dependencies using <code>poetry install</code>.
While <code>pip install</code> still works,  we highly recommend using poetry to ensure predictable dependency management.</p>
</li>
<li class="">
<p>The KServe SDK is also slimmed down by making the cloud storage dependency optional, if you require storage dependency for custom serving runtimes you can still install with <code>pip install kserve[storage]</code>.</p>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-kserve-python-runtimes-improvements">🔧 KServe Python Runtimes Improvements<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-kserve-python-runtimes-improvements" class="hash-link" aria-label="Direct link to 🔧 KServe Python Runtimes Improvements" title="Direct link to 🔧 KServe Python Runtimes Improvements" translate="no">​</a></h3>
<ul>
<li class="">
<p>KServe Python Runtimes including <a href="https://kserve.github.io/archive/0.11/modelserving/v1beta1/sklearn/v2/" target="_blank" rel="noopener noreferrer" class="">sklearnserver</a>, <a href="https://kserve.github.io/archive/0.11/modelserving/v1beta1/lightgbm/" target="_blank" rel="noopener noreferrer" class="">lgbserver</a>, <a href="https://kserve.github.io/archive/0.11/modelserving/v1beta1/xgboost/" target="_blank" rel="noopener noreferrer" class="">xgbserver</a>
now support the open inference protocol for both REST and gRPC.</p>
</li>
<li class="">
<p>Logging improvements including adding Uvicorn access logging and a default KServe logger.</p>
</li>
<li class="">
<p><code>Postprocess</code> handler has been aligned with open inference protocol, simplifying the underlying transportation protocol complexities.</p>
</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-llm-runtimes">🤖 LLM Runtimes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-llm-runtimes" class="hash-link" aria-label="Direct link to 🤖 LLM Runtimes" title="Direct link to 🤖 LLM Runtimes" translate="no">​</a></h3>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="torchserve-llm-runtime">TorchServe LLM Runtime<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#torchserve-llm-runtime" class="hash-link" aria-label="Direct link to TorchServe LLM Runtime" title="Direct link to TorchServe LLM Runtime" translate="no">​</a></h4>
<p>KServe now integrates with TorchServe 0.8, offering the support for <a href="https://pytorch.org/serve/large_model_inference.html" target="_blank" rel="noopener noreferrer" class="">LLM models</a> that may not fit onto a single GPU.
Huggingface Accelerate and Deepspeed are available options to split the model into multiple partitions over multiple GPUs. You can see the <a href="https://kserve.github.io/archive/0.11/modelserving/v1beta1/llm/torchserve/accelerate/" target="_blank" rel="noopener noreferrer" class="">detailed example</a> for how to serve the LLM on KServe with TorchServe runtime.</p>
<h4 class="anchor anchorTargetStickyNavbar_Vzrq" id="vllm-runtime">vLLM Runtime<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#vllm-runtime" class="hash-link" aria-label="Direct link to vLLM Runtime" title="Direct link to vLLM Runtime" translate="no">​</a></h4>
<p>Serving LLM models can be surprisingly slow even on high end GPUs, <a href="https://github.com/vllm-project/vllm" target="_blank" rel="noopener noreferrer" class="">vLLM</a> is a fast and easy-to-use LLM inference engine. It can achieve 10x-20x higher throughput than Huggingface transformers.
It supports <a href="https://www.anyscale.com/blog/continuous-batching-llm-inference" target="_blank" rel="noopener noreferrer" class="">continuous batching</a> for increased throughput and GPU utilization,
<a href="https://vllm.ai/" target="_blank" rel="noopener noreferrer" class="">paged attention</a> to address the memory bottleneck where in the autoregressive decoding process all the attention key value tensors(KV Cache) are kept in the GPU memory to generate next tokens.</p>
<p>In the <a href="https://kserve.github.io/archive/0.11/modelserving/v1beta1/llm/vllm/" target="_blank" rel="noopener noreferrer" class="">example</a> we show how to deploy vLLM on KServe and expects further integration in KServe 0.12 with proposed <a href="https://github.com/kserve/open-inference-protocol/pull/7" target="_blank" rel="noopener noreferrer" class="">generate endpoint</a> for open inference protocol.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-modelmesh-updates">📊 ModelMesh Updates<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-modelmesh-updates" class="hash-link" aria-label="Direct link to 📊 ModelMesh Updates" title="Direct link to 📊 ModelMesh Updates" translate="no">​</a></h2>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-storing-models-on-kubernetes-persistent-volumes-pvc">💾 Storing Models on Kubernetes Persistent Volumes (PVC)<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-storing-models-on-kubernetes-persistent-volumes-pvc" class="hash-link" aria-label="Direct link to 💾 Storing Models on Kubernetes Persistent Volumes (PVC)" title="Direct link to 💾 Storing Models on Kubernetes Persistent Volumes (PVC)" translate="no">​</a></h3>
<p>ModelMesh now allows to <a href="https://github.com/kserve/modelmesh-serving/blob/main/docs/predictors/setup-storage.md#deploy-a-model-stored-on-a-persistent-volume-claim" target="_blank" rel="noopener noreferrer" class="">directly mount model files onto serving runtimes pods</a>
using <a href="https://kubernetes.io/docs/concepts/storage/persistent-volumes/" target="_blank" rel="noopener noreferrer" class="">Kubernetes Persistent Volumes</a>. Depending on the selected <a href="https://kubernetes.io/docs/concepts/storage/storage-classes/" target="_blank" rel="noopener noreferrer" class="">storage solution</a> this approach can significantly reduce latency when deploying new predictors,
potentially remove the need for additional S3 cloud object storage like AWS S3, GCS, or Azure Blob Storage altogether.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-horizontal-pod-autoscaling-hpa">⚡ Horizontal Pod Autoscaling (HPA)<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-horizontal-pod-autoscaling-hpa" class="hash-link" aria-label="Direct link to ⚡ Horizontal Pod Autoscaling (HPA)" title="Direct link to ⚡ Horizontal Pod Autoscaling (HPA)" translate="no">​</a></h3>
<p>Kubernetes <a href="https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/" target="_blank" rel="noopener noreferrer" class="">Horizontal Pod Autoscaling</a> can now be used at the serving runtime pod level. With HPA enabled, the ModelMesh controller no longer manages the number of replicas. Instead, a <code>HorizontalPodAutoscaler</code> automatically updates the serving
runtime deployment with the number of Pods to best match the demand.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-model-metrics-metrics-dashboard-payload-event-logging">📈 Model Metrics, Metrics Dashboard, Payload Event Logging<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-model-metrics-metrics-dashboard-payload-event-logging" class="hash-link" aria-label="Direct link to 📈 Model Metrics, Metrics Dashboard, Payload Event Logging" title="Direct link to 📈 Model Metrics, Metrics Dashboard, Payload Event Logging" translate="no">​</a></h3>
<p>ModelMesh v0.11 introduces a new configuration option to emit a subset of useful metrics at the individual model level. These metrics can help identify outlier or "heavy hitter" models and consequently fine-tune the deployments of those inference services, like allocating more resources or increasing the number of replicas for improved responsiveness or avoid frequent cache misses.</p>
<p>A new <a href="https://github.com/kserve/modelmesh-serving/blob/main/docs/monitoring.md#import-the-grafana-dashboard" target="_blank" rel="noopener noreferrer" class="">Grafana dashboard</a> was added to display the comprehensive set of <a href="https://github.com/kserve/modelmesh-serving/blob/main/docs/monitoring.md" target="_blank" rel="noopener noreferrer" class="">Prometheus metrics</a> like model loading
and unloading rates, internal queuing delays, capacity and usage, cache state, etc. to monitor the general health of the ModelMesh Serving deployment.</p>
<p>The new <a href="https://github.com/kserve/modelmesh/blob/main/src/main/java/com/ibm/watson/modelmesh/payload/" target="_blank" rel="noopener noreferrer" class=""><code>PayloadProcessor</code> interface</a> can be implemented to log prediction requests and responses, to create data sinks for data visualization, for model quality assessment, or for drift and outlier detection by external monitoring systems.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-whats-changed">⚠️ What's Changed?<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#%EF%B8%8F-whats-changed" class="hash-link" aria-label="Direct link to ⚠️ What's Changed?" title="Direct link to ⚠️ What's Changed?" translate="no">​</a></h2>
<ul>
<li class="">
<p>To allow longer InferenceService name due to DNS max length limits from <a href="https://github.com/kserve/kserve/issues/1397" target="_blank" rel="noopener noreferrer" class="">issue</a>, the <code>Default</code> suffix in the inference service component(predictor/transformer/explainer) name has been removed for newly created InferenceServices.
This affects the client that is using the component url directly instead of the top level InferenceService url.</p>
</li>
<li class="">
<p>Status.address.url is now consistent for both serverless and raw deployment mode, the url path portion is dropped in serverless mode.</p>
</li>
<li class="">
<p>Raw bytes are now accepted in v1 protocol, setting the right content-type header to <code>application/json</code> is required to recognize and decode the json payload if <code>content-type</code> is specified.</p>
</li>
</ul>
<div class="language-bash codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-bash codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token plain">curl -v -H "Content-Type: application/json" http://sklearn-iris.kserve-test.${CUSTOM_DOMAIN}/v1/models/sklearn-iris:predict -d @./iris-input.json</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.11.0" target="_blank" rel="noopener noreferrer" class="">GitHub release pages</a> for KServe v0.11 and <a href="https://github.com/kserve/modelmesh-serving/releases/tag/v0.11.0" target="_blank" rel="noopener noreferrer" class="">ModelMesh v0.11</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<ul>
<li class=""><strong>Core Contributors</strong>: The KServe maintainers and regular as well as new contributors</li>
<li class=""><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</li>
<li class=""><strong>Working Group</strong>: All members of the KServe Working Group for their ongoing collaboration</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.11-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues" target="_blank" rel="noopener noreferrer" class="">#kserve</a>)</li>
<li class="">Attend our community meeting by subscribing to the <a href="https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month" target="_blank" rel="noopener noreferrer" class="">KServe calendar</a>.</li>
<li class="">View our <a href="https://github.com/kserve/community" target="_blank" rel="noopener noreferrer" class="">community github repository</a> to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[Announcing KServe v0.10.0]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release</guid>
            <pubDate>Sun, 05 Feb 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.10 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on February 5, 2023</em></p>
<p>We are excited to announce KServe 0.10 release. In this release we have enabled more KServe networking options, improved KServe telemetry for supported serving runtimes and increased support coverage for <a href="https://kserve.github.io/archive/0.10/modelserving/data_plane/v2_protocol/" target="_blank" rel="noopener noreferrer" class="">Open(aka v2) inference protocol</a> for both standard and ModelMesh InferenceService.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-kserve-networking-options">🌐 KServe Networking Options<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-kserve-networking-options" class="hash-link" aria-label="Direct link to 🌐 KServe Networking Options" title="Direct link to 🌐 KServe Networking Options" translate="no">​</a></h2>
<p>Istio is now optional for both <a href="https://kserve.github.io/archive/0.10/admin/serverless/serverless/" target="_blank" rel="noopener noreferrer" class="">Serverless</a> and <a href="https://kserve.github.io/archive/0.10/admin/kubernetes_deployment/" target="_blank" rel="noopener noreferrer" class="">RawDeployment</a> mode. Please see the <a href="https://kserve.github.io/archive/0.10/admin/serverless/kourier_networking/" target="_blank" rel="noopener noreferrer" class="">alternative networking guide</a> for how you can enable other ingress options supported by Knative with Serverless mode.
For Istio users, if you want to turn on full service mesh mode to secure InferenceService with mutual TLS and enable the traffic policies, please read the <a href="https://kserve.github.io/archive/0.10/admin/serverless/servicemesh/" target="_blank" rel="noopener noreferrer" class="">service mesh setup guideline</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-kserve-telemetry-for-serving-runtimes">📊 KServe Telemetry for Serving Runtimes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-kserve-telemetry-for-serving-runtimes" class="hash-link" aria-label="Direct link to 📊 KServe Telemetry for Serving Runtimes" title="Direct link to 📊 KServe Telemetry for Serving Runtimes" translate="no">​</a></h2>
<p>We have instrumented additional latency metrics in KServe Python ServingRuntimes for <code>preprocess</code>, <code>predict</code> and <code>postprocess</code> handlers.
In Serverless mode we have extended Knative <code>queue-proxy</code> to enable metrics aggregation for both metrics exposed in <code>queue-proxy</code> and <code>kserve-container</code> from each <code>ServingRuntime</code>.
Please read the <a href="https://kserve.github.io/archive/0.10/modelserving/observability/prometheus_metrics/" target="_blank" rel="noopener noreferrer" class="">prometheus metrics setup guideline</a> for how to enable the metrics scraping and aggregations.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-openv2-inference-protocol-support-coverage">🚀 Open(v2) Inference Protocol Support Coverage<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-openv2-inference-protocol-support-coverage" class="hash-link" aria-label="Direct link to 🚀 Open(v2) Inference Protocol Support Coverage" title="Direct link to 🚀 Open(v2) Inference Protocol Support Coverage" translate="no">​</a></h2>
<p>As there have been increasing adoptions for <code>KServe v2 Inference Protocol</code> from <a href="https://kserve.github.io/archive/0.10/modelserving/v1beta1/amd/" target="_blank" rel="noopener noreferrer" class="">AMD Inference ServingRuntime</a> which
supports FPGAs and OpenVINO which now provides KServe <a href="https://docs.openvino.ai/latest/ovms_docs_rest_api_kfs.html" target="_blank" rel="noopener noreferrer" class="">REST</a> and <a href="https://docs.openvino.ai/latest/ovms_docs_grpc_api_kfs.html" target="_blank" rel="noopener noreferrer" class="">gRPC</a> compatible API,
in <a href="https://github.com/kserve/kserve/issues/2663" target="_blank" rel="noopener noreferrer" class="">the issue</a> we have proposed to rename to <code>KServe Open Inference Protocol</code>.</p>
<p>In KServe 0.10, we have added Open(v2) inference protocol support for KServe custom runtimes.
Now, you can enable v2 REST/gRPC for both custom transformer and predictor with images built by implementing KServe Python SDK API.
gRPC enables high performance inference data plane as it is built on top of HTTP/2 and binary data transportation which is more efficient to send over the wire compared to REST.
Please see the detailed example for <a href="https://kserve.github.io/archive/0.10/modelserving/v1beta1/transformer/torchserve_image_transformer/" target="_blank" rel="noopener noreferrer" class="">transformer</a> and
<a href="https://kserve.github.io/archive/0.10/modelserving/v1beta1/custom/custom_model/" target="_blank" rel="noopener noreferrer" class="">predictor</a>.</p>
<div class="language-python codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-python codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token keyword" style="color:#00009f">from</span><span class="token plain"> kserve </span><span class="token keyword" style="color:#00009f">import</span><span class="token plain"> Model</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">image_transform</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">byte_array</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    image_processing </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> transforms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">Compose</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        transforms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">ToTensor</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        transforms</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">Normalize</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">0.1307</span><span class="token punctuation" style="color:#393A34">,</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">(</span><span class="token number" style="color:#36acaa">0.3081</span><span class="token punctuation" style="color:#393A34">,</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    image </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> Image</span><span class="token punctuation" style="color:#393A34">.</span><span class="token builtin">open</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">io</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">BytesIO</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">byte_array</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    tensor </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> image_processing</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">image</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">numpy</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> tensor</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">CustomModel</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">Model</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">predict</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferRequest</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> headers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Dict</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> InferResponse</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        input_tensors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">image_transform</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">instance</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> instance </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">inputs</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        input_tensors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">asarray</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">input_tensors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        output </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">model</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">input_tensors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        torch</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">nn</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">functional</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">softmax</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">output</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> dim</span><span class="token operator" style="color:#393A34">=</span><span class="token number" style="color:#36acaa">1</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        values</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> top_5 </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> torch</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">topk</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">output</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">5</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        result </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> values</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">flatten</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">tolist</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        response_id </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> generate_uuid</span><span class="token punctuation" style="color:#393A34">(</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        infer_output </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> InferOutput</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"output-0"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> shape</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">list</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">values</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">shape</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> datatype</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"FP32"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> data</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">result</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        infer_response </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> InferResponse</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model_name</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">name</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> infer_outputs</span><span class="token operator" style="color:#393A34">=</span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">infer_output</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> response_id</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">response_id</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> infer_response</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain" style="display:inline-block"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token keyword" style="color:#00009f">class</span><span class="token plain"> </span><span class="token class-name">CustomTransformer</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">Model</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token keyword" style="color:#00009f">def</span><span class="token plain"> </span><span class="token function" style="color:#d73a49">preprocess</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferRequest</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> headers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Dict</span><span class="token punctuation" style="color:#393A34">[</span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token builtin">str</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token operator" style="color:#393A34">-</span><span class="token operator" style="color:#393A34">&gt;</span><span class="token plain"> InferRequest</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        input_tensors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">image_transform</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">instance</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"> </span><span class="token keyword" style="color:#00009f">for</span><span class="token plain"> instance </span><span class="token keyword" style="color:#00009f">in</span><span class="token plain"> request</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">inputs</span><span class="token punctuation" style="color:#393A34">[</span><span class="token number" style="color:#36acaa">0</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">data</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        input_tensors </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> np</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">asarray</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">input_tensors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        infer_inputs </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain">InferInput</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">name</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">"INPUT__0"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> datatype</span><span class="token operator" style="color:#393A34">=</span><span class="token string" style="color:#e3116c">'FP32'</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> shape</span><span class="token operator" style="color:#393A34">=</span><span class="token builtin">list</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">input_tensors</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">shape</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">                                   data</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">input_tensors</span><span class="token punctuation" style="color:#393A34">)</span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        infer_request </span><span class="token operator" style="color:#393A34">=</span><span class="token plain"> InferRequest</span><span class="token punctuation" style="color:#393A34">(</span><span class="token plain">model_name</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">self</span><span class="token punctuation" style="color:#393A34">.</span><span class="token plain">model_name</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> infer_inputs</span><span class="token operator" style="color:#393A34">=</span><span class="token plain">infer_inputs</span><span class="token punctuation" style="color:#393A34">)</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token keyword" style="color:#00009f">return</span><span class="token plain"> infer_request</span><br></span></code></pre></div></div>
<p>You can use the same Python API type <code>InferRequest</code> and <code>InferResponse</code> for both REST and gRPC protocol. KServe handles the underlying decoding and encoding according to the protocol.</p>
<p>⚠️ <strong>Warning</strong>: A new <code>headers</code> argument is added to the custom handlers to pass http/gRPC headers or other metadata. You can also use this as context dict to pass data between handlers.
If you have existing custom transformer or predictor, the <code>headers</code> argument is now required to add to the <code>preprocess</code>, <code>predict</code> and <code>postprocess</code> handlers.</p>
<p>Please check the following matrix for supported ModelFormats and <a href="https://kserve.github.io/archive/0.10/modelserving/v1beta1/serving_runtime/" target="_blank" rel="noopener noreferrer" class="">ServingRuntimes</a>.</p>
<table><thead><tr><th>Model Format</th><th>v1</th><th>Open(v2) REST/gRPC</th></tr></thead><tbody><tr><td>Tensorflow</td><td>✅ TFServing</td><td>✅ Triton</td></tr><tr><td>PyTorch</td><td>✅ TorchServe</td><td>✅ TorchServe</td></tr><tr><td>TorchScript</td><td>✅ TorchServe</td><td>✅ Triton</td></tr><tr><td>ONNX</td><td>❌</td><td>✅ Triton</td></tr><tr><td>Scikit-learn</td><td>✅ KServe</td><td>✅ MLServer</td></tr><tr><td>XGBoost</td><td>✅ KServe</td><td>✅ MLServer</td></tr><tr><td>LightGBM</td><td>✅ KServe</td><td>✅ MLServer</td></tr><tr><td>MLFlow</td><td>❌</td><td>✅ MLServer</td></tr><tr><td>Custom</td><td>✅ KServe</td><td>✅ KServe</td></tr></tbody></table>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-multi-arch-image-support">🏗️ Multi-Arch Image Support<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#%EF%B8%8F-multi-arch-image-support" class="hash-link" aria-label="Direct link to 🏗️ Multi-Arch Image Support" title="Direct link to 🏗️ Multi-Arch Image Support" translate="no">​</a></h2>
<p>KServe control plane images <a href="https://hub.docker.com/r/kserve/kserve-controller/tags" target="_blank" rel="noopener noreferrer" class="">kserve-controller</a>,
<a href="https://hub.docker.com/r/kserve/agent/tags" target="_blank" rel="noopener noreferrer" class="">kserve/agent</a>, <a href="https://hub.docker.com/r/kserve/router/tags" target="_blank" rel="noopener noreferrer" class="">kserve/router</a> are now supported
for multiple architectures: <code>ppc64le</code>, <code>arm64</code>, <code>amd64</code>, <code>s390x</code>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-kserve-storage-credentials-support">🔐 KServe Storage Credentials Support<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-kserve-storage-credentials-support" class="hash-link" aria-label="Direct link to 🔐 KServe Storage Credentials Support" title="Direct link to 🔐 KServe Storage Credentials Support" translate="no">​</a></h2>
<ul>
<li class="">Currently, AWS users need to create a secret with long term/static IAM credentials for downloading models stored in S3.
Security best practice is to use <a href="https://aws.amazon.com/blogs/opensource/introducing-fine-grained-iam-roles-service-accounts/" target="_blank" rel="noopener noreferrer" class="">IAM role for service account(IRSA)</a>
which enables automatic credential rotation and fine-grained access control, see how to <a href="https://kserve.github.io/archive/0.10/modelserving/storage/s3/s3/#create-service-account-with-iam-role" target="_blank" rel="noopener noreferrer" class="">setup IRSA</a>.</li>
<li class="">Support Azure Blobs with <a href="https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azcli" target="_blank" rel="noopener noreferrer" class="">managed identity</a>.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-modelmesh-updates">📊 ModelMesh Updates<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-modelmesh-updates" class="hash-link" aria-label="Direct link to 📊 ModelMesh Updates" title="Direct link to 📊 ModelMesh Updates" translate="no">​</a></h2>
<p>ModelMesh has continued to integrate itself as KServe's multi-model serving backend, introducing improvements and features that better align the two projects. For example, it now supports ClusterServingRuntimes, allowing use of cluster-scoped ServingRuntimes, originally introduced in KServe 0.8.</p>
<p>Additionally, ModelMesh introduced support for TorchServe enabling users to serve arbitrary PyTorch models (e.g. eager-mode) in the context of distributed-multi-model serving.</p>
<p>Other limitations have been addressed as well, such as adding support for BYTES/string type tensors when using the REST inference API for inference requests that require them.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.10.0" target="_blank" rel="noopener noreferrer" class="">GitHub release pages</a> for KServe v0.10 and <a href="https://github.com/kserve/modelmesh-serving/releases/tag/v0.10.0" target="_blank" rel="noopener noreferrer" class="">ModelMesh v0.10</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<p><strong>Individual Contributors:</strong></p>
<ul>
<li class=""><a href="https://github.com/sel" target="_blank" rel="noopener noreferrer" class="">Steve Larkin</a></li>
<li class=""><a href="https://github.com/stephanschielke" target="_blank" rel="noopener noreferrer" class="">Stephan Schielke</a></li>
<li class=""><a href="https://github.com/cmaddalozzo" target="_blank" rel="noopener noreferrer" class="">Curtis Maddalozzo</a></li>
<li class=""><a href="https://github.com/laozc" target="_blank" rel="noopener noreferrer" class="">Zhongcheng Lao</a></li>
<li class=""><a href="https://github.com/dimara" target="_blank" rel="noopener noreferrer" class="">Dimitris Aragiorgis</a></li>
<li class=""><a href="https://github.com/panli889" target="_blank" rel="noopener noreferrer" class="">Pan Li</a></li>
<li class=""><a href="https://github.com/tjandy98" target="_blank" rel="noopener noreferrer" class="">tjandy98</a></li>
<li class=""><a href="https://github.com/sukumargaonkar" target="_blank" rel="noopener noreferrer" class="">Sukumar Gaonkar</a></li>
<li class=""><a href="https://github.com/rachitchauhan43" target="_blank" rel="noopener noreferrer" class="">Rachit Chauhan</a></li>
<li class=""><a href="https://github.com/rafvasq" target="_blank" rel="noopener noreferrer" class="">Rafael Vasquez</a></li>
<li class=""><a href="https://github.com/TimKleinloog" target="_blank" rel="noopener noreferrer" class="">Tim Kleinloog</a></li>
<li class=""><a href="https://github.com/ckadner" target="_blank" rel="noopener noreferrer" class="">Christian Kadner</a></li>
<li class=""><a href="https://github.com/ddelange" target="_blank" rel="noopener noreferrer" class="">ddelange</a></li>
<li class=""><a href="https://github.com/lizzzcai" target="_blank" rel="noopener noreferrer" class="">Lize Cai</a></li>
<li class=""><a href="https://github.com/park12sj" target="_blank" rel="noopener noreferrer" class="">sangjune.park</a></li>
<li class=""><a href="https://github.com/Suresh-Nakkeran" target="_blank" rel="noopener noreferrer" class="">Suresh Nakkeran</a></li>
<li class=""><a href="https://github.com/MessKon" target="_blank" rel="noopener noreferrer" class="">Konstantinos Messis</a></li>
<li class=""><a href="https://github.com/matty-rose" target="_blank" rel="noopener noreferrer" class="">Matt Rose</a></li>
<li class=""><a href="https://github.com/alexagriffith" target="_blank" rel="noopener noreferrer" class="">Alexa Griffith</a></li>
<li class=""><a href="https://github.com/jagadeeshi2i" target="_blank" rel="noopener noreferrer" class="">Jagadeesh J</a></li>
<li class=""><a href="https://github.com/alembiewski" target="_blank" rel="noopener noreferrer" class="">Alex Lembiyeuski</a></li>
<li class=""><a href="https://github.com/tenzen-y" target="_blank" rel="noopener noreferrer" class="">Yuki Iwai</a></li>
<li class=""><a href="https://github.com/andyi2it" target="_blank" rel="noopener noreferrer" class="">Andrews Arokiam</a></li>
<li class=""><a href="https://github.com/xfu83" target="_blank" rel="noopener noreferrer" class="">Xin Fu</a></li>
<li class=""><a href="https://github.com/adilhusain-s" target="_blank" rel="noopener noreferrer" class="">adilhusain-s</a></li>
<li class=""><a href="https://github.com/pranavpandit1" target="_blank" rel="noopener noreferrer" class="">Pranav Pandit</a></li>
<li class=""><a href="https://github.com/C1berwiz" target="_blank" rel="noopener noreferrer" class="">C1berwiz</a></li>
<li class=""><a href="https://github.com/dilverse" target="_blank" rel="noopener noreferrer" class="">dilverse</a></li>
<li class=""><a href="https://github.com/terrytangyuan" target="_blank" rel="noopener noreferrer" class="">Yuan Tang</a></li>
<li class=""><a href="https://github.com/yuzisun" target="_blank" rel="noopener noreferrer" class="">Dan Sun</a></li>
<li class=""><a href="https://github.com/njhill" target="_blank" rel="noopener noreferrer" class="">Nick Hill</a></li>
</ul>
<p><strong>Core Contributors</strong>: The KServe maintainers and working group members</p>
<p><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.10-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues" target="_blank" rel="noopener noreferrer" class="">#kserve</a>)</li>
<li class="">Attend our community meeting by subscribing to the <a href="https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month" target="_blank" rel="noopener noreferrer" class="">KServe calendar</a>.</li>
<li class="">View our <a href="https://github.com/kserve/community" target="_blank" rel="noopener noreferrer" class="">community github repository</a> to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[Announcing KServe v0.9.0]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release</guid>
            <pubDate>Thu, 21 Jul 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.9 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on July 21, 2022</em></p>
<p>Today, we are pleased to announce the v0.9.0 release of KServe! <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">KServe</a> has now fully onboarded to <a href="https://lfaidata.foundation/" target="_blank" rel="noopener noreferrer" class="">LF AI &amp; Data Foundation</a> as an <a href="https://lfaidata.foundation/projects/kserve" target="_blank" rel="noopener noreferrer" class="">Incubation Project</a>! 🎉</p>
<p>In this release we are excited to introduce the new <code>InferenceGraph</code> feature which has long been asked from the community. Also continuing the effort from the last release for unifying the InferenceService API for deploying models on KServe and ModelMesh, ModelMesh is now fully compatible with KServe InferenceService API!</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-introducing-inferencegraph">🚀 Introducing InferenceGraph<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-introducing-inferencegraph" class="hash-link" aria-label="Direct link to 🚀 Introducing InferenceGraph" title="Direct link to 🚀 Introducing InferenceGraph" translate="no">​</a></h2>
<p>The ML Inference system is getting bigger and more complex. It often consists of many models to make a single prediction.
The common use cases are image classification and natural language multi-stage processing pipelines. For example, an image classification pipeline needs to run top level classification first then downstream further classification based on previous prediction results.</p>
<p>KServe has the unique strength to build the distributed inference graph with its native integration of InferenceServices, standard inference protocol for chaining models and serverless auto-scaling capabilities. KServe leverages these strengths to build the InferenceGraph and enable users to deploy complex ML Inference pipelines to production in a declarative and scalable way.</p>
<p><strong>InferenceGraph</strong> is made up of a list of routing nodes with each node consisting of a set of routing steps. Each step can either route to an InferenceService or another node defined on the graph which makes the InferenceGraph highly composable.
The graph router is deployed behind an HTTP endpoint and can be scaled dynamically based on request volume. The InferenceGraph supports four different types of routing nodes: <strong>Sequence</strong>, <strong>Switch</strong>, <strong>Ensemble</strong>, <strong>Splitter</strong>.</p>
<p><img decoding="async" loading="lazy" alt="InferenceGraph" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/inference_graph-c394dbbe6fb6a1ff7f03706f82566247.png" width="1962" height="834" class="img_ev3q"></p>
<ul>
<li class=""><strong>Sequence Node</strong>: It allows users to define multiple <code>Steps</code> with <code>InferenceServices</code> or <code>Nodes</code> as routing targets in a sequence. The <code>Steps</code> are executed in sequence and the request/response from the previous step and be passed to the next step as input based on configuration.</li>
<li class=""><strong>Switch Node</strong>: It allows users to define routing conditions and select a <code>Step</code> to execute if it matches the condition. The response is returned as soon as it finds the first step that matches the condition. If no condition is matched, the graph returns the original request.</li>
<li class=""><strong>Ensemble Node</strong>: A model ensemble requires scoring each model separately and then combines the results into a single prediction response. You can then use different combination methods to produce the final result. Multiple classification trees, for example, are commonly combined using a "majority vote" method. Multiple regression trees are often combined using various averaging techniques.</li>
<li class=""><strong>Splitter Node</strong>: It allows users to split the traffic to multiple targets using a weighted distribution.</li>
</ul>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"serving.kserve.io/v1beta1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"InferenceService"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"cat-dog-classifier"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">pytorch</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storageUri</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//kfserving</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">examples/models/torchserve/cat_dog_classification</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">---</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"serving.kserve.io/v1beta1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"InferenceService"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"dog-breed-classifier"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">pytorch</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 100m</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storageUri</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> gs</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//kfserving</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">examples/models/torchserve/dog_breed_classification</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">---</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"serving.kserve.io/v1alpha1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"InferenceGraph"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"dog-breed-pipeline"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">nodes</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">root</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">routerType</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> Sequence</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">steps</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">serviceName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> cat</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">dog</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">classifier</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> cat_dog_classifier </span><span class="token comment" style="color:#999988;font-style:italic"># step name</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">serviceName</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> dog</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">breed</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">classifier</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> dog_breed_classifier</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">data</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> $request</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">condition</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"[@this].#(predictions.0==\"dog\")"</span><br></span></code></pre></div></div>
<p>Currently <code>InferenceGraph</code> is supported with the <code>Serverless</code> deployment mode. You can try it out following the <a href="https://kserve.github.io/archive/0.9/modelserving/inference_graph/image_pipeline/" target="_blank" rel="noopener noreferrer" class="">tutorial</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-inferenceservice-api-for-modelmesh">🔗 InferenceService API for ModelMesh<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-inferenceservice-api-for-modelmesh" class="hash-link" aria-label="Direct link to 🔗 InferenceService API for ModelMesh" title="Direct link to 🔗 InferenceService API for ModelMesh" translate="no">​</a></h2>
<p>The InferenceService CRD is now the primary interface for interacting with ModelMesh. Some changes were made to the InferenceService spec to better facilitate ModelMesh's needs.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-storage-spec">💾 Storage Spec<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-storage-spec" class="hash-link" aria-label="Direct link to 💾 Storage Spec" title="Direct link to 💾 Storage Spec" translate="no">​</a></h3>
<p>To unify how model storage is defined for both single and multi-model serving, a new storage spec was added to the predictor model spec. With this storage spec, users can specify a key inside a common secret holding config/credentials for each of the storage backends from which models can be loaded. Example:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> localMinIO </span><span class="token comment" style="color:#999988;font-style:italic"># Credential key for the destination storage in the common secret</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">path</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> sklearn </span><span class="token comment" style="color:#999988;font-style:italic"># Model path inside the bucket</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token comment" style="color:#999988;font-style:italic"># schemaPath: null # Optional schema files for payload schema</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">parameters</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token comment" style="color:#999988;font-style:italic"># Parameters to override the default values inside the common secret.</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">bucket</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> example</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">models</span><br></span></code></pre></div></div>
<p>Learn more <a href="https://github.com/kserve/kserve/tree/release-0.9/docs/samples/storage/storageSpec" target="_blank" rel="noopener noreferrer" class="">here</a>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-model-status">📊 Model Status<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-model-status" class="hash-link" aria-label="Direct link to 📊 Model Status" title="Direct link to 📊 Model Status" translate="no">​</a></h3>
<p>For further alignment between ModelMesh and KServe, some additions to the InferenceService status were made. There is now a <code>Model Status</code> section which contains information about the model loaded in the predictor. New fields include:</p>
<ul>
<li class=""><code>states</code> - State information of the predictor's model.</li>
<li class=""><code>activeModelState</code> - The state of the model currently being served by the predictor's endpoints.</li>
<li class=""><code>targetModelState</code> - This will be set only when <code>transitionStatus</code> is not <code>UpToDate</code>, meaning that the target model differs from the currently-active model.</li>
<li class=""><code>transitionStatus</code> - Indicates state of the predictor relative to its current spec.</li>
<li class=""><code>modelCopies</code> - Model copy information of the predictor's model.</li>
<li class=""><code>lastFailureInfo</code> - Details about the most recent error associated with this predictor. Not all of the contained fields will necessarily have a value.</li>
</ul>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="-deploying-on-modelmesh">🚢 Deploying on ModelMesh<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-deploying-on-modelmesh" class="hash-link" aria-label="Direct link to 🚢 Deploying on ModelMesh" title="Direct link to 🚢 Deploying on ModelMesh" translate="no">​</a></h3>
<p>For deploying InferenceServices on ModelMesh, the ModelMesh and KServe controllers will still require that the user specifies the <code>serving.kserve.io/deploymentMode: ModelMesh</code> annotation.
A complete example on an InferenceService with the new storage spec is showing below:</p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> example</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">tensorflow</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">mnist</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">annotations</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">serving.kserve.io/deploymentMode</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ModelMesh</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> tensorflow</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storage</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">key</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> localMinIO</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">path</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> tensorflow/mnist.savedmodel</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-other-new-features">🛠️ Other New Features<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#%EF%B8%8F-other-new-features" class="hash-link" aria-label="Direct link to 🛠️ Other New Features" title="Direct link to 🛠️ Other New Features" translate="no">​</a></h2>
<ul>
<li class="">Support <a href="https://kserve.github.io/archive/0.9/modelserving/v1beta1/mlflow/v2/" target="_blank" rel="noopener noreferrer" class="">serving MLFlow model format</a> via MLServer serving runtime.</li>
<li class="">Support <a href="https://kserve.github.io/archive/0.9/modelserving/autoscaling/autoscaling/" target="_blank" rel="noopener noreferrer" class="">unified autoscaling target and metric fields</a> for InferenceService components with both Serverless and RawDeployment mode.</li>
<li class="">Support <a href="https://kserve.github.io/archive/0.9/admin/kubernetes_deployment/" target="_blank" rel="noopener noreferrer" class="">InferenceService ingress class and url domain template configuration</a> for RawDeployment mode.</li>
<li class="">ModelMesh now has a default <a href="https://github.com/openvinotoolkit/model_server" target="_blank" rel="noopener noreferrer" class="">OpenVINO Model Server</a> ServingRuntime.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-whats-changed">⚠️ What's Changed?<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#%EF%B8%8F-whats-changed" class="hash-link" aria-label="Direct link to ⚠️ What's Changed?" title="Direct link to ⚠️ What's Changed?" translate="no">​</a></h2>
<ul>
<li class="">The KServe controller manager is changed from StatefulSet to Deployment to support HA mode.</li>
<li class="">log4j security vulnerability fix</li>
<li class="">Upgrade TorchServe serving runtime to 0.6.0</li>
<li class="">Update MLServer serving runtime to 1.0.0</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.9.0" target="_blank" rel="noopener noreferrer" class="">GitHub release pages</a> for KServe and <a href="https://github.com/kserve/modelmesh-serving/releases/tag/v0.9.0" target="_blank" rel="noopener noreferrer" class="">ModelMesh</a> for more details.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<ul>
<li class=""><strong>Core Contributors</strong>: The KServe maintainers and working group members</li>
<li class=""><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</li>
<li class=""><strong>LF AI &amp; Data Foundation</strong>: For supporting KServe's journey as an incubation project</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.9-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://github.com/kserve/community?tab=readme-ov-file#questions-and-issues" target="_blank" rel="noopener noreferrer" class="">#kserve</a>)</li>
<li class="">Attend our community meeting by subscribing to the <a href="https://zoom-lfx.platform.linuxfoundation.org/meetings/kserve?view=month" target="_blank" rel="noopener noreferrer" class="">KServe calendar</a>.</li>
<li class="">View our <a href="https://github.com/kserve/community" target="_blank" rel="noopener noreferrer" class="">community github repository</a> to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[Announcing KServe v0.8]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release</guid>
            <pubDate>Fri, 18 Feb 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.8 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on February 18, 2022</em></p>
<p>Today, we are pleased to announce the v0.8.0 release of KServe! While the last release was focused on the <a href="https://blog.kubeflow.org/release/official/2021/09/27/kfserving-transition.html" target="_blank" rel="noopener noreferrer" class="">transition</a> of KFServing to KServe, this release was focused on unifying the InferenceService API for deploying models on KServe and ModelMesh.</p>
<blockquote>
<p><strong>Note</strong>: For current users of KFServing/KServe, please take a few minutes to answer this <a href="https://groups.google.com/g/kubeflow-discuss/c/B0trz3qZiJE" target="_blank" rel="noopener noreferrer" class="">short survey</a> and provide your feedback!</p>
</blockquote>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-whats-changed">⚠️ What's Changed<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#%EF%B8%8F-whats-changed" class="hash-link" aria-label="Direct link to ⚠️ What's Changed" title="Direct link to ⚠️ What's Changed" translate="no">​</a></h2>
<ul>
<li class=""><strong>ONNX Runtime Server</strong> has been removed from the supported serving runtime list. KServe by default now uses the <strong>Triton Inference Server</strong> to serve ONNX models.</li>
<li class="">KServe's <strong>PyTorchServer</strong> has been removed from the supported serving runtime list. KServe by default now uses <strong>TorchServe</strong> to serve PyTorch models.</li>
<li class="">A few main KServe SDK class names have been changed:<!-- -->
<ul>
<li class=""><strong>KFModel</strong> is renamed to <strong>Model</strong></li>
<li class=""><strong>KFServer</strong> is renamed to <strong>ModelServer</strong></li>
<li class=""><strong>KFModelRepository</strong> is renamed to <strong>ModelRepository</strong></li>
</ul>
</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-whats-new">🚀 What's New<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-whats-new" class="hash-link" aria-label="Direct link to 🚀 What's New" title="Direct link to 🚀 What's New" translate="no">​</a></h2>
<p>Some notable updates are:</p>
<ul>
<li class=""><strong>ClusterServingRuntime</strong> and <strong>ServingRuntime</strong> CRDs are introduced. Learn more <a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-servingruntimes-and-clusterservingruntimes" class="">below</a>.</li>
<li class="">A new <strong>Model Spec</strong> was introduced to the InferenceService Predictor Spec as a new way to specify models. Learn more <a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-updated-inferenceservice-predictor-spec" class="">below</a>.</li>
<li class=""><strong>Knative 1.0</strong> is now supported and certified for the KServe Serverless installation.</li>
<li class=""><strong>gRPC</strong> is now supported for transformer to predictor network communication.</li>
<li class=""><strong>TorchServe</strong> Serving runtime has been updated to 0.5.2 which now supports the KServe V2 REST protocol.</li>
<li class=""><strong>ModelMesh</strong> now has multi-namespace support, and users can now deploy GCS or HTTP(S) hosted models.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-servingruntimes-and-clusterservingruntimes">🔧 ServingRuntimes and ClusterServingRuntimes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-servingruntimes-and-clusterservingruntimes" class="hash-link" aria-label="Direct link to 🔧 ServingRuntimes and ClusterServingRuntimes" title="Direct link to 🔧 ServingRuntimes and ClusterServingRuntimes" translate="no">​</a></h2>
<p>This release introduces two new CRDs <em>ServingRuntimes</em> and <em>ClusterServingRuntimes</em> with the only difference between these two is that one is namespace-scoped and one is cluster-scoped. A ServingRuntime defines the templates for Pods that can serve one or more particular model formats. Each ServingRuntime defines key information such as the container image of the runtime and a list of the model formats that the runtime supports.</p>
<p>In previous versions of KServe, supported predictor formats and container images were defined in a <a href="https://github.com/kserve/kserve/blob/release-0.7/config/configmap/inferenceservice.yaml#L7" target="_blank" rel="noopener noreferrer" class="">config map</a> in the control plane namespace. The ServingRuntime CRD should allow for improved flexibility and extensibility for defining or customizing runtimes to how you see fit without having to modify any controller code or any resources in the controller namespace.</p>
<p>Several out-of-the-box ClusterServingRuntimes are provided with KServe so that users can continue to use KServe how they did before without having to define the runtimes themselves.</p>
<p><strong>Example SKLearn ClusterServingRuntime:</strong></p>
<div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1alpha1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> ClusterServingRuntime</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kserve</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sklearnserver</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">supportedModelFormats</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> sklearn</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">version</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">autoSelect</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean important" style="color:#36acaa">true</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">containers</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kserve</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">container</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">image</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> kserve/sklearnserver</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">latest</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">args</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_name=</span><span class="token punctuation" style="color:#393A34">{</span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain">.Name</span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">model_dir=/mnt/models</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">-</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">http_port=8080</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">resources</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">requests</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 2Gi</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">limits</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">cpu</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"1"</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">          </span><span class="token key atrule" style="color:#00a4db">memory</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> 2Gi</span><br></span></code></pre></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-updated-inferenceservice-predictor-spec">📋 Updated InferenceService Predictor Spec<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-updated-inferenceservice-predictor-spec" class="hash-link" aria-label="Direct link to 📋 Updated InferenceService Predictor Spec" title="Direct link to 📋 Updated InferenceService Predictor Spec" translate="no">​</a></h2>
<p>A new Model spec was also introduced as a part of the Predictor spec for InferenceServices. One of the problems KServe was having was that the InferenceService CRD was becoming unwieldy with each model serving runtime being an object in the Predictor spec. This generated a lot of field duplication in the schema, bloating the overall size of the CRD. If a user wanted to introduce a new model serving framework for KServe to support, the CRD would have to be modified, and subsequently the controller code.</p>
<p>Now, with the Model spec, a user can specify a model format and optionally a corresponding version. The KServe control plane will automatically select and use the <em>ClusterServingRuntime</em> or <em>ServingRuntime</em> that supports the given format. Each <em>ServingRuntime</em> maintains a list of supported model formats and versions. If a format has <code>autoselect</code> as <code>true</code>, then that opens the <em>ServingRuntime</em> up for automatic model placement for that model format.</p>
<!-- -->
<div class="theme-tabs-container tabs-container tabList__CuJ"><ul role="tablist" aria-orientation="horizontal" class="tabs"><li role="tab" tabindex="0" aria-selected="true" class="tabs__item tabItem_LNqP tabs__item--active">New Schema</li><li role="tab" tabindex="-1" aria-selected="false" class="tabs__item tabItem_LNqP">Previous Schema</li></ul><div class="margin-top--md"><div role="tabpanel" class="tabItem_Ymn6"><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> example</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sklearn</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">isvc</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">model</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">modelFormat</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> sklearn</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storageUri</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> s3</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//bucket/sklearn/mnist.joblib</span><br></span></code></pre></div></div></div><div role="tabpanel" class="tabItem_Ymn6" hidden=""><div class="language-yaml codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-yaml codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><span class="token-line" style="color:#393A34"><span class="token key atrule" style="color:#00a4db">apiVersion</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> serving.kserve.io/v1beta1</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">kind</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> InferenceService</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">metadata</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">name</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> example</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">sklearn</span><span class="token punctuation" style="color:#393A34">-</span><span class="token plain">isvc</span><br></span><span class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token key atrule" style="color:#00a4db">spec</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token key atrule" style="color:#00a4db">predictor</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token key atrule" style="color:#00a4db">sklearn</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"></span><br></span><span class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token key atrule" style="color:#00a4db">storageUri</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain"> s3</span><span class="token punctuation" style="color:#393A34">:</span><span class="token plain">//bucket/sklearn/mnist.joblib</span><br></span></code></pre></div></div></div></div></div>
<p>The previous way of defining predictors is still supported, however, the new approach will be the preferred one going forward. Eventually, the previous schema, with the framework names as keys in the predictor spec, will be removed.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-modelmesh-updates">🌐 ModelMesh Updates<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-modelmesh-updates" class="hash-link" aria-label="Direct link to 🌐 ModelMesh Updates" title="Direct link to 🌐 ModelMesh Updates" translate="no">​</a></h2>
<p><a href="https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai/" target="_blank" rel="noopener noreferrer" class="">ModelMesh</a> has been in the process of integrating as KServe's multi-model serving backend. With the inclusion of the aforementioned ServingRuntime CRDs and the Predictor Model spec, the two projects are now much more aligned, with continual improvements underway.</p>
<p>ModelMesh now supports multi-namespace reconciliation. Previously, the ModelMesh controller would only reconcile against resources deployed in the same namespace as the controller. Now, by default, ModelMesh will be able to handle InferenceService deployments in any "modelmesh-enabled" namespace. Learn more <a href="https://github.com/kserve/modelmesh-serving/blob/release-0.8/docs/install/install-script.md#setup-additional-namespaces" target="_blank" rel="noopener noreferrer" class="">here</a>.</p>
<p>Also, while ModelMesh previously only supported S3-based storage, we are happy to share that ModelMesh now works with models hosted using GCS and HTTP(S).</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>To see all release updates, check out the KServe <a href="https://github.com/kserve/kserve/releases/tag/v0.8.0" target="_blank" rel="noopener noreferrer" class="">release notes</a> and ModelMesh Serving <a href="https://github.com/kserve/modelmesh-serving/releases/tag/v0.8.0" target="_blank" rel="noopener noreferrer" class="">release notes</a>!</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<ul>
<li class=""><strong>Authors</strong>: Dan Sun, Paul Van Eck, Vedant Padwal, Andrews Arokiam on behalf of the KServe Working Group</li>
<li class=""><strong>Core Contributors</strong>: The KServe maintainers and working group members</li>
<li class=""><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.8-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the Slack (<a href="https://kubeflow.slack.com/join/shared_invite/zt-n73pfj05-l206djXlXk5qdQKs4o1Zkg#/" target="_blank" rel="noopener noreferrer" class="">#kubeflow-kfserving</a>)</li>
<li class="">Attend a <a href="https://docs.google.com/document/d/1KZUURwr9MnHXqHA08TFbfVbM8EAJSJjmaMhnvstvi-k/edit#heading=h.4i9fb8ndp9vp" target="_blank" rel="noopener noreferrer" class="">biweekly community meeting on Wednesday 9am PST</a></li>
<li class="">View our <a href="https://github.com/kserve/website/blob/v0.8/docs/developer/developer.md" target="_blank" rel="noopener noreferrer" class="">developer</a> and <a href="https://github.com/kserve/website/blob/v0.8/docs/help/contributor/mkdocs-contributor-guide.md" target="_blank" rel="noopener noreferrer" class="">doc</a> contribution guides to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!</li>
</ul>
<p><strong>Happy serving!</strong></p>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[Announcing KServe v0.7 - Smooth Transition from KFServing to KServe]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release</guid>
            <pubDate>Mon, 11 Oct 2021 00:00:00 GMT</pubDate>
            <description><![CDATA[KServe 0.7 Release Blog Post]]></description>
            <content:encoded><![CDATA[<p><em>Published on October 11, 2021</em></p>
<p><a class="" href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition">KFServing is now KServe</a> and KServe 0.7 release is available, the release also ensures a smooth user migration experience from KFServing to KServe.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="️-whats-changed">⚠️ What's Changed<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release#%EF%B8%8F-whats-changed" class="hash-link" aria-label="Direct link to ⚠️ What's Changed" title="Direct link to ⚠️ What's Changed" translate="no">​</a></h2>
<ul>
<li class=""><code>InferenceService</code> API group is changed from <code>serving.kubeflow.org</code> to <code>serving.kserve.io</code> <a href="https://github.com/kserve/kserve/issues/1826" target="_blank" rel="noopener noreferrer" class="">#1826</a>, <a href="https://kserve.github.io/archive/0.7/admin/migration/" target="_blank" rel="noopener noreferrer" class="">the migration job</a> is created for smooth transition.</li>
<li class="">Python SDK name is changed from <a href="https://pypi.org/project/kfserving" target="_blank" rel="noopener noreferrer" class="">kfserving</a> to <a href="https://pypi.org/project/kserve" target="_blank" rel="noopener noreferrer" class="">kserve</a>.</li>
<li class="">KServe Installation manifests <a href="https://github.com/kserve/kserve/issues/1824" target="_blank" rel="noopener noreferrer" class="">#1824</a>.</li>
<li class="">Models-web-app is separated out of the kserve repository to <a href="https://github.com/kserve/models-web-app" target="_blank" rel="noopener noreferrer" class="">models-web-app</a>.</li>
<li class="">Docs and examples are moved to separate repository <a href="https://github.com/kserve/website" target="_blank" rel="noopener noreferrer" class="">website</a>.</li>
<li class="">KServe images are migrated to kserve docker hub account.</li>
<li class="">v1alpha2 API group is deprecated <a href="https://github.com/kserve/kserve/issues/1850" target="_blank" rel="noopener noreferrer" class="">#1850</a>.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-whats-new">🚀 What's New<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release#-whats-new" class="hash-link" aria-label="Direct link to 🚀 What's New" title="Direct link to 🚀 What's New" translate="no">​</a></h2>
<ul>
<li class="">
<p><strong>ModelMesh project is joining KServe</strong> under repository <a href="https://github.com/kserve/modelmesh-serving" target="_blank" rel="noopener noreferrer" class="">modelmesh-serving</a>!</p>
<p>ModelMesh is designed for high-scale, high-density and frequently-changing model use cases. ModelMesh intelligently loads and unloads AI models to and from memory to strike an intelligent trade-off between responsiveness to users and computational footprint. To learn more about ModelMesh features and components, check out the <a href="https://developer.ibm.com/blogs/kserve-and-watson-modelmesh-extreme-scale-model-inferencing-for-trusted-ai" target="_blank" rel="noopener noreferrer" class="">ModelMesh announcement blog</a> and <a href="https://www.linkedin.com/feed/update/urn:li:activity:6854064203360280576/" target="_blank" rel="noopener noreferrer" class="">Join talk at #KubeCon NA to get a deeper dive into ModelMesh and KServe</a>.</p>
</li>
<li class="">
<p><strong>(Alpha feature)</strong> Raw Kubernetes deployment support, Istio/Knative dependency is now optional and please follow the <a href="https://kserve.github.io/archive/0.7/admin/kubernetes_deployment" target="_blank" rel="noopener noreferrer" class="">guide</a> to install and turn on <code>RawDeployment</code> mode.</p>
</li>
<li class="">
<p>KServe now has its own documentation website temporarily hosted on <a href="https://kserve.github.io/website" target="_blank" rel="noopener noreferrer" class="">website</a>.</p>
</li>
<li class="">
<p>Support v1 crd and webhook configuration for Kubernetes 1.22 <a href="https://github.com/kserve/kserve/issues/1837" target="_blank" rel="noopener noreferrer" class="">#1837</a>.</p>
</li>
<li class="">
<p>Triton model serving runtime now defaults to 21.09 version <a href="https://github.com/kserve/kserve/issues/1840" target="_blank" rel="noopener noreferrer" class="">#1840</a>.</p>
</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-whats-fixed">🔧 What's Fixed<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release#-whats-fixed" class="hash-link" aria-label="Direct link to 🔧 What's Fixed" title="Direct link to 🔧 What's Fixed" translate="no">​</a></h2>
<ul>
<li class="">Bug fix for Azure blob storage <a href="https://github.com/kserve/kserve/issues/1845" target="_blank" rel="noopener noreferrer" class="">#1845</a>.</li>
<li class="">Tar/Zip support for all storage options <a href="https://github.com/kserve/kserve/issues/1836" target="_blank" rel="noopener noreferrer" class="">#1836</a>.</li>
<li class="">Fix AWS_REGION env variable and add AWS_CA_BUNDLE for S3 <a href="https://github.com/kserve/kserve/issues/1780" target="_blank" rel="noopener noreferrer" class="">#1780</a>.</li>
<li class="">Torchserve custom package install fix <a href="https://github.com/kserve/kserve/issues/1619" target="_blank" rel="noopener noreferrer" class="">#1619</a>.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-release-notes">🔍 Release Notes<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release#-release-notes" class="hash-link" aria-label="Direct link to 🔍 Release Notes" title="Direct link to 🔍 Release Notes" translate="no">​</a></h2>
<p>For complete release notes including all changes, bug fixes, and known issues, visit the <a href="https://github.com/kserve/kserve/releases/tag/v0.7.0" target="_blank" rel="noopener noreferrer" class="">GitHub release page</a>.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-acknowledgments">🙏 Acknowledgments<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release#-acknowledgments" class="hash-link" aria-label="Direct link to 🙏 Acknowledgments" title="Direct link to 🙏 Acknowledgments" translate="no">​</a></h2>
<p>We want to thank all the contributors who made this release possible:</p>
<p><strong>Individual Contributors:</strong></p>
<ul>
<li class=""><a href="https://github.com/andyi2it" target="_blank" rel="noopener noreferrer" class="">Andrews Arokiam</a></li>
<li class=""><a href="https://github.com/animeshsingh" target="_blank" rel="noopener noreferrer" class="">Animesh Singh</a></li>
<li class=""><a href="https://github.com/chinhuang007" target="_blank" rel="noopener noreferrer" class="">Chin Huang</a></li>
<li class=""><a href="http://github.com/yuzisun" target="_blank" rel="noopener noreferrer" class="">Dan Sun</a></li>
<li class=""><a href="https://github.com/jagadeeshi2i" target="_blank" rel="noopener noreferrer" class="">Jagadeesh</a></li>
<li class=""><a href="https://github.com/jinchihe" target="_blank" rel="noopener noreferrer" class="">Jinchi He</a></li>
<li class=""><a href="https://github.com/njhill" target="_blank" rel="noopener noreferrer" class="">Nick Hill</a></li>
<li class=""><a href="https://github.com/pvaneck" target="_blank" rel="noopener noreferrer" class="">Paul Van Eck</a></li>
<li class=""><a href="https://github.com/Iamlovingit" target="_blank" rel="noopener noreferrer" class="">Qianshan Chen</a></li>
<li class=""><a href="https://github.com/Suresh-Nakkeran" target="_blank" rel="noopener noreferrer" class="">Suresh Nakkiran</a></li>
<li class=""><a href="https://github.com/sukumargaonkar" target="_blank" rel="noopener noreferrer" class="">Sukumar Gaonkar</a></li>
<li class=""><a href="https://github.com/theofpa" target="_blank" rel="noopener noreferrer" class="">Theofilos Papapanagiotou</a></li>
<li class=""><a href="https://github.com/Tomcli" target="_blank" rel="noopener noreferrer" class="">Tommy Li</a></li>
<li class=""><a href="https://github.com/js-ts" target="_blank" rel="noopener noreferrer" class="">Vedant Padwal</a></li>
<li class=""><a href="https://github.com/PatrickXYS" target="_blank" rel="noopener noreferrer" class="">Yao Xiao</a></li>
<li class=""><a href="https://github.com/yuzliu" target="_blank" rel="noopener noreferrer" class="">Yuzhui Liu</a></li>
</ul>
<p><strong>Core Contributors</strong>: The KServe maintainers and working group members</p>
<p><strong>Community</strong>: Everyone who reported issues, provided feedback, and tested features during this important transition</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kserve-0.7-release#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the <a href="https://kubeflow.slack.com/join/shared_invite/zt-n73pfj05-l206djXlXk5qdQKs4o1Zkg#/" target="_blank" rel="noopener noreferrer" class="">Slack (#kubeflow-kfserving)</a></li>
<li class="">Attend a <a href="https://docs.google.com/document/d/1KZUURwr9MnHXqHA08TFbfVbM8EAJSJjmaMhnvstvi-k/edit#heading=h.4i9fb8ndp9vp" target="_blank" rel="noopener noreferrer" class="">Biweekly community meeting on Wednesday 9am PST</a></li>
<li class="">Contribute at <a href="https://github.com/kserve/website/blob/main/docs/developer/developer.md" target="_blank" rel="noopener noreferrer" class="">developer</a> and <a href="https://github.com/kserve/website/blob/main/docs/help/contributor/mkdocs-contributor-guide.md" target="_blank" rel="noopener noreferrer" class="">doc contribution</a> guide to make code or doc contributions. We are excited to work with you to make KServe better and promote its adoption by more and more users!</li>
</ul>
<p><strong>Happy serving!</strong></p>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of our community during this important transition!</em></p>]]></content:encoded>
            <category>Releases</category>
        </item>
        <item>
            <title><![CDATA[KServe: The next generation of KFServing]]></title>
            <link>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition</link>
            <guid>https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition</guid>
            <pubDate>Mon, 27 Sep 2021 00:00:00 GMT</pubDate>
            <description><![CDATA[Announcing the transition from KFServing to KServe]]></description>
            <content:encoded><![CDATA[<p><em>Published on September 27, 2021</em></p>
<p>We are excited to announce the next chapter for KFServing. In coordination with the Kubeflow Project Steering Group, the <a href="https://github.com/kubeflow/kfserving" target="_blank" rel="noopener noreferrer" class="">KFServing GitHub repository</a> has now been transferred to an independent <a href="https://github.com/kserve/kserve" target="_blank" rel="noopener noreferrer" class="">KServe GitHub organization</a> under the stewardship of the Kubeflow Serving Working Group leads.</p>
<p>The project has been rebranded from <strong>KFServing</strong> to <strong>KServe</strong>, and we are planning to graduate the project from Kubeflow Project later this year.</p>
<p><img decoding="async" loading="lazy" alt="KFServing to KServe Transition" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/image1-88ae02ce8957a75ad191a74d1a743bfb.png" width="1256" height="730" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-project-background">🎯 Project Background<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition#-project-background" class="hash-link" aria-label="Direct link to 🎯 Project Background" title="Direct link to 🎯 Project Background" translate="no">​</a></h2>
<p>Developed collaboratively by Google, IBM, Bloomberg, NVIDIA, and Seldon in 2019, KFServing was published as open source in early 2019. The project sets out to provide the following features:</p>
<ul>
<li class="">A simple, yet powerful, Kubernetes Custom Resource for deploying machine learning (ML) models on production across ML frameworks.</li>
<li class="">Provide performant, standardized inference protocol.</li>
<li class="">Serverless inference according to live traffic patterns, supporting "Scale-to-zero" on both CPUs and GPUs.</li>
<li class="">Complete story for production ML Model Serving including prediction, pre/post-processing, explainability, and monitoring.</li>
<li class="">Support for deploying thousands of models at scale and inference graph capability for multiple models.</li>
</ul>
<p>KFServing was created to address the challenges of deploying and monitoring machine learning models on production for organizations. After publishing the open source project, we've seen an explosion in demand for the software, leading to strong adoption and community growth. The scope of the project has since increased, and we have developed multiple components along the way, including our own growing body of documentation that needs its own website and independent GitHub organization.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-whats-next">🚀 What's Next<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition#-whats-next" class="hash-link" aria-label="Direct link to 🚀 What's Next" title="Direct link to 🚀 What's Next" translate="no">​</a></h2>
<p>Over the coming weeks, we will be releasing <strong>KServe 0.7</strong> outside of the Kubeflow Project and will provide more details on how to migrate from KFServing to KServe with minimal disruptions. KFServing 0.5.x/0.6.x releases are still supported in next six months after KServe 0.7 release. We are also working on integrating core Kubeflow APIs and standards for <a href="https://docs.google.com/document/d/1a9ufoe_6DB1eSjpE9eK5nRBoH3ItoSkbPfxRA0AjPIc" target="_blank" rel="noopener noreferrer" class="">the conformance program</a>.</p>
<p>For contributors, please follow the KServe <a href="https://github.com/kserve/website/blob/v0.7/docs/developer/developer.md" target="_blank" rel="noopener noreferrer" class="">developer</a> and <a href="https://github.com/kserve/website/blob/v0.7/docs/help/contributor/mkdocs-contributor-guide.md" target="_blank" rel="noopener noreferrer" class="">doc contribution</a> guide to make code or doc contributions. We are excited to work with you to make KServe better and promote its adoption by more and more users!</p>
<p><img decoding="async" loading="lazy" alt="KServe Logo" src="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/assets/images/kserve-b9befb7647f020cdab9eb81b3f627404.png" width="3322" height="1677" class="img_ev3q"></p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-kserve-key-links">🔗 KServe Key Links<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition#-kserve-key-links" class="hash-link" aria-label="Direct link to 🔗 KServe Key Links" title="Direct link to 🔗 KServe Key Links" translate="no">​</a></h2>
<ul>
<li class=""><a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a></li>
<li class=""><a href="https://github.com/kserve/kserve/" target="_blank" rel="noopener noreferrer" class="">Github</a></li>
<li class=""><a href="https://kubeflow.slack.com/join/shared_invite/zt-n73pfj05-l206djXlXk5qdQKs4o1Zkg#/" target="_blank" rel="noopener noreferrer" class="">Slack (#kubeflow-kfserving)</a></li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-contributor-acknowledgement">🙏 Contributor Acknowledgement<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition#-contributor-acknowledgement" class="hash-link" aria-label="Direct link to 🙏 Contributor Acknowledgement" title="Direct link to 🙏 Contributor Acknowledgement" translate="no">​</a></h2>
<p>We'd like to thank all the KServe contributors for this transition work!</p>
<p><strong>Individual Contributors:</strong></p>
<ul>
<li class=""><a href="https://github.com/andyi2it" target="_blank" rel="noopener noreferrer" class="">Andrews Arokiam</a></li>
<li class=""><a href="https://github.com/animeshsingh" target="_blank" rel="noopener noreferrer" class="">Animesh Singh</a></li>
<li class=""><a href="https://github.com/chinhuang007" target="_blank" rel="noopener noreferrer" class="">Chin Huang</a></li>
<li class=""><a href="http://github.com/yuzisun" target="_blank" rel="noopener noreferrer" class="">Dan Sun</a></li>
<li class=""><a href="https://github.com/jagadeeshi2i" target="_blank" rel="noopener noreferrer" class="">Jagadeesh</a></li>
<li class=""><a href="https://github.com/jinchihe" target="_blank" rel="noopener noreferrer" class="">Jinchi He</a></li>
<li class=""><a href="https://github.com/njhill" target="_blank" rel="noopener noreferrer" class="">Nick Hill</a></li>
<li class=""><a href="https://github.com/pvaneck" target="_blank" rel="noopener noreferrer" class="">Paul Van Eck</a></li>
<li class=""><a href="https://github.com/Iamlovingit" target="_blank" rel="noopener noreferrer" class="">Qianshan Chen</a></li>
<li class=""><a href="https://github.com/Suresh-Nakkeran" target="_blank" rel="noopener noreferrer" class="">Suresh Nakkiran</a></li>
<li class=""><a href="https://github.com/sukumargaonkar" target="_blank" rel="noopener noreferrer" class="">Sukumar Gaonkar</a></li>
<li class=""><a href="https://github.com/theofpa" target="_blank" rel="noopener noreferrer" class="">Theofilos Papapanagiotou</a></li>
<li class=""><a href="https://github.com/Tomcli" target="_blank" rel="noopener noreferrer" class="">Tommy Li</a></li>
<li class=""><a href="https://github.com/js-ts" target="_blank" rel="noopener noreferrer" class="">Vedant Padwal</a></li>
<li class=""><a href="https://github.com/PatrickXYS" target="_blank" rel="noopener noreferrer" class="">Yao Xiao</a></li>
<li class=""><a href="https://github.com/yuzliu" target="_blank" rel="noopener noreferrer" class="">Yuzhui Liu</a></li>
</ul>
<p><strong>Core Contributors</strong>: The KServe maintainers and Kubeflow Serving Working Group leads</p>
<p><strong>Community</strong>: Everyone who supported this important transition and helped establish KServe as an independent project</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="-join-the-community">🤝 Join the Community<a href="https://deploy-preview-643--elastic-nobel-0aef7a.netlify.app/blog/kfserving-transition#-join-the-community" class="hash-link" aria-label="Direct link to 🤝 Join the Community" title="Direct link to 🤝 Join the Community" translate="no">​</a></h2>
<ul>
<li class="">Visit our <a href="https://kserve.github.io/website/" target="_blank" rel="noopener noreferrer" class="">Website</a> or <a href="https://github.com/kserve/kserve/" target="_blank" rel="noopener noreferrer" class="">GitHub</a></li>
<li class="">Join the <a href="https://kubeflow.slack.com/join/shared_invite/zt-n73pfj05-l206djXlXk5qdQKs4o1Zkg#/" target="_blank" rel="noopener noreferrer" class="">Slack (#kubeflow-kfserving)</a></li>
<li class="">Follow the KServe <a href="https://github.com/kserve/website/blob/v0.7/docs/developer/developer.md" target="_blank" rel="noopener noreferrer" class="">developer</a> and <a href="https://github.com/kserve/website/blob/v0.7/docs/help/contributor/mkdocs-contributor-guide.md" target="_blank" rel="noopener noreferrer" class="">doc contribution</a> guides to make contributions</li>
</ul>
<p><strong>Welcome to KServe!</strong></p>
<hr>
<p><em>The KServe team is committed to making machine learning model serving simple, scalable, and standardized. Thank you for being part of this exciting transition!</em></p>]]></content:encoded>
            <category>Announcements</category>
        </item>
    </channel>
</rss>