Skip to main content

Command Palette

Search for a command to run...

Building a KEV Insight Copilot with MindsDB

Using MindsDB and Airbyte for RAG-Based KEV Threat Intelligence and Analytics

Updated
10 min read
Building a KEV Insight Copilot with MindsDB
M
I'm an Technical Solution

Introduction

Background and Overview

Managing software vulnerabilities is challenging because new threats keep emerging. The KEV Insight Copilot helps tackle this by using RAG (Retrieval-Augmented Generation) to search a structured knowledge base of Known Exploited Vulnerabilities (KEVs).

Use Case & Flow Architecture

The KEV Insight Copilot acts as a smart assistant to help organizations manage vulnerabilities and prioritize mitigation efforts effectively. It uses automated data collection, a centralized knowledge base, and AI-driven natural language analysis to support quicker and more informed decision-making.

The workflow is as follows:

  1. Data Ingestion: The copilot automatically fetches and syncs the latest KEV data from the CISA Known Exploited Vulnerabilities Catalog, ensuring analysts always have up-to-date information.

  2. Data Warehouse: Airbyte transforms and loads the KEV records into a PostgreSQL database, where the data is cleaned, normalized, and prepared for efficient querying and retrieval.

  3. MindsDB AI: This AI Analytics and Knowledge Engine connects to the Postgres data source.

    • Builds RAG-enabled knowledge bases.

    • Supports SQL and natural-language queries over KEV data.

    • Provides context-aware response generation.

  4. Streamlit Web App: using the MindsDB Python SDK serves as the front-end for user can ask questions in natural language to quickly extract specific information, such as:

    • “Show me all KEVs affecting Cisco products.”

    • “Which vulnerabilities were added last month?”

    • “List critical exploited vulnerabilities requiring immediate mitigation.”

Setting Up the Foundation

Known Exploited Vulnerabilities (KEV) data is maintained by CISA and serves as the authoritative source for vulnerabilities confirmed to be actively exploited. This dataset will be the primary knowledge base for the AI agent.

Airbyte is an open-source data integration platform used here to automatically ingest and sync KEV data into PostgreSQL. It is installed using the official Airbyte OSS Quickstart.

PostgreSQL acts as the central data warehouse, storing normalized KEV records for efficient retrieval and analysis.

MindsDB provides the AI Agent layer, connecting to PostgreSQL and enabling:

  • RAG-based intelligence

  • Natural-language query capabilities

  • Knowledge Base creation

Deploy PostgreSQL and MindsDB in our environment using Docker Compose.

version: '3.9'

services:
  postgres:
    image: postgres:16
    container_name: postgres_cyberintel
    restart: always
    environment:
      POSTGRES_USER: mindsdb_user
      POSTGRES_PASSWORD: mindsdb_pass
      POSTGRES_DB: cyber_threat_db
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data

  mindsdb:
    image: mindsdb/mindsdb
    container_name: mindsdb_container
    restart: always
    environment:
      MINDSDB_APIS: "http,mysql"
    ports:
      - "47334:47334" # HTTP API
      - "47335:47335" # MySQL API
    depends_on:
      - postgres

To start the containers, use the Docker Compose setup provided.

docker compose up -d

Ingest CISA KEV Data to PostgreSQL using Airbyte

Step 1 : Set Up CISA Known Exploited Vulnerabilities (KEV) as a Source

To begin, let’s set up CISA Known Exploited Vulnerabilities (KEV) as a data source in Airbyte using custom connector:

  1. Run Airbyte on your own infrastructure

  2. Go to the Connector Builder and create a new project by selecting Start from scratch.

  3. Configure Global Settings and Inputs: Set up the connector parameters as follows:

ParameterSettingDescription
Endpoint URLURL Official CISA KEV feedThe target URL to fetch the CISA KEV feed in JSON format.
HTTP MethodGETThe standard method for retrieving data from the web resource.
Extractor (Record Selector)Dpath ExtractorA tool similar to JSONPath used to point to a specific part of the API response.
Field Path (Record Selector)vulnerabilitiesThis path points to the main array in the JSON response containing the list of vulnerabilities. Every element under this array will be treated as a single record.
Advanced Settings (Record Filter)bodyUsed to filter extracted records.
Primary Key (Single Key)cveIDThe cveID is selected as the primary key, which is an excellent practice as every vulnerability has a unique CVE, ensuring no record duplication upon storage.
  1. Click Publish. Choose Publish to organization so members can use the connector.

  2. Go back to the Airbyte Menu, click on "Sources," and select “New Source”. Filter by "Custom" and select the CISA KEV custom connector you just created.

  3. Save and Test the connection to ensure Airbyte can successfully access the data.

Step 2 : Set Up PostgreSQL as a Destination

Next, we configure PostgreSQL to be the final destination for our KEV data.

  1. Click on "Destinations" in the left-hand navigation menu and select "Postgres".

  2. Enter your PostgreSQL connection details (host, port, database, credentials, etc.) and authenticate Airbyte.

  3. Test the connection to ensure Airbyte can successfully connect to your PostgreSQL database.

Step 3 : Creating & Running a Connection

With both the CISA Known Exploited Vulnerabilities (KEV) data and PostgreSQL set up as the source and destination, we can now create a connection to start moving the data:

  1. Click on "Connections" in the left-hand navigation menu and select "+ New Connection".

  2. For Define source and destination, choose the existing CISA KEV source and the existing PostgreSQL destination.

  3. Configure the connection: Provide a clear Connection name, set the replication frequency (how often you want the data to sync), and define the destination namespace (the schema/table in PostgreSQL).

  4. When you're done, click “Finish & Sync” to start the initial sync job.

  5. Monitor the job status and progress in Airbyte to ensure the data is being transferred successfully to your PostgreSQL database.

The Airbyte connection will create the following tables in PostgreSQL that are used in MindsDB:

  • cisa_known_exploited — This is the main table that contains all KEV (Known Exploited Vulnerabilities) records from the CISA feed.

AI-Agent for Insight into Known Exploited Vulnerabilities (KEV)

Connecting MindsDB to PostgreSQL

To enable MindsDB to query the KEV dataset, we need to register PostgreSQL as a data source.

CREATE DATABASE cyber_postgres
WITH ENGINE = 'postgres',
PARAMETERS = {
  "user": "mindsdb_user",
  "password": "mindsdb_pass",
  "host": "host.docker.internal",
  "port": "5432",
  "database": "cyber_threat_db"
};

SELECT *
FROM cyber_postgres.public.cisa_known_exploited
LIMIT 10;

This query returns:

  • MindsDB can now access all tables inside the cyber_postgres database

  • Including the public.cisa_known_exploited table created by Airbyte

Create a Knowledge Base

A knowledge base in MindsDB is an AI table that stores embeddings and metadata for semantic reasoning. Instead of matching exact keywords, the Knowledge Base:

  • Understands cybersecurity context

  • Retrieves similar vulnerabilities using embeddings

  • Enables RAG-based question answering for analysts

MindsDB knowledge bases internally use:

  • Embedding Models → for vectorized semantic understanding

  • Vector Stores → for fast similarity retrieval

  • Rerankers → to highlight the most relevant KEV insights

Use the CREATE KNOWLEDGE_BASE command. This design allows analysts to search both technical context and business-critical metadata.

CREATE KNOWLEDGE_BASE kev_kb
USING
    embedding_model = {
        "provider": "openai",
        "model_name": "text-embedding-3-large",
        "api_key": "your-model-api-key"
    },
    reranking_model = {
        "provider": "openai",
        "model_name": "gpt-4o",
        "api_key": "your-model-api-key"
    },
    metadata_columns = ["vendorProject", "product", "cveID", "dateAdded"],
    content_columns = ["shortDescription", "vulnerabilityName", "notes"],
    id_column = "cveID";

Add Data to the Knowledge Bases

We load data from the PostgreSQL table cisa_known_exploited.

  • Embeddings are generated automatically.

  • Data becomes ready for AI semantic querying.

INSERT INTO kev_kb
SELECT *
FROM cyber_postgres.public.cisa_known_exploited;

Semantic Search Query

Search for vulnerabilities related to Microsoft.

SELECT *
FROM kev_kb
WHERE content = 'Microsoft';

This query returns:

Use Semantic Search with Metadata Filters

Query the knowledge base using semantic search and adjust the relevance parameter.

SELECT *
FROM kev_kb
WHERE vendorProject = 'Microsoft'
AND content = 'privilege escalation'
AND relevance >= 0.50;

This query returns:

Create a MindsDB Agent

A MindsDB Agent allows you to interact directly with the KEV dataset and knowledge base using natural language. It supports:

  • RAG-based responses

  • SQL + semantic search + metadata enrichment

Create an agent by connecting the data and setting up the underlying model.

CREATE AGENT kev_security_agent
USING
    model = {
        "provider": "openai",
        "model_name" : "gpt-4o",
        "api_key": "your-model-api-key"
    },
    data = {
         "knowledge_bases": ["kev_kb"]
    },
    prompt_template='
        You are a cybersecurity expert specializing in Known Exploited Vulnerabilities.
        describe data from kev_kb to answer questions precisely with references to CVE IDs.Always mention the product/vendor and type of vulnerability.
    ';

Test querying an agent by asking questions about the connected data.

SELECT answer
FROM kev_security_agent
WHERE question = 'List all vulnerabilities from Cisco?';

Deploy AI-Agent to Streamlit — KEV Insight Copilot

This section explains how to deploy the KEV Insight Copilot, a specialized RAG application created with Streamlit and the MindsDB Python SDK. The application is designed to use a MindsDB Agent to run natural language queries on the CISA Known Exploited Vulnerabilities (KEV).

Prerequisites

Before running the Streamlit application, please ensure the following critical conditions are met:

  1. MindsDB Server is Running: Make sure your MindsDB server is active and accessible.

  2. AI Agent Exists: The AI Agent should be set up in the MindsDB instance to process your natural language queries on the KEV data.

Creating the Streamlit Application

Create a file named app.py and fill it with code to set up the user interface, connect to MindsDB, and manage the chat logic.

import streamlit as st
import mindsdb_sdk
import pandas as pd
import time

st.set_page_config(page_title="KEV Insight Copilot", page_icon="🤖", layout="wide")

menu = st.sidebar.radio("📌 Navigation", ["KEV Insight Copilot"])

st.sidebar.title("KEV Security Agent")
server_url = st.sidebar.text_input("MindsDB Server URL", "http://127.0.0.1:47334")
project_name = st.sidebar.text_input("Project Name", "mindsdb")
agent_name = st.sidebar.text_input("Agent/Table Name", "kev_security_agent")

@st.cache_resource
def connect_mindsdb(url):
    try:
        return mindsdb_sdk.connect(url)
    except Exception:
        return None

server = connect_mindsdb(server_url)

if server is None:
    st.error("❌ Failed to connect to MindsDB. Please check the URL.")
    st.stop()

try:
    project = server.get_project(project_name)
except Exception as e:
    st.error(f"❌ Failed to get MindsDB project: {e}")
    st.stop()

if menu == "KEV Insight Copilot":

    st.title("🤖 KEV Insight Copilot")
    st.markdown("Ask anything related to known exploited vulnerabilities (CISA KEV Catalog).")

    if "processing" not in st.session_state:
        st.session_state.processing = False
    if "last_answer" not in st.session_state:
        st.session_state.last_answer = None
    if "history" not in st.session_state:
        st.session_state.history = []

    question = st.text_input("Your Question:")
    ask_btn = st.button("Ask", disabled=st.session_state.processing or len(question.strip()) == 0)

    if ask_btn:
        st.session_state.processing = True
        st.session_state.last_answer = None
        st.rerun()

    if st.session_state.processing:
        with st.spinner("Analyzing vulnerability database…"):
            current_question = question
            try:
                query_str = f"SELECT answer FROM {agent_name} WHERE question = '{current_question}';"
                result = project.query(query_str).fetch()

                answer = result.iloc[0]["answer"] if not result.empty else "⚠️ No answer available."
                st.session_state.last_answer = answer

                st.session_state.history.append({"role": "user", "text": current_question})
                st.session_state.history.append({"role": "agent", "text": answer})

            except Exception as e:
                st.session_state.last_answer = f"❌ Error: {str(e)}"

            time.sleep(0.5)
            st.session_state.processing = False
            st.rerun()

    for msg in reversed(st.session_state.history[-8:]):
        if msg["role"] == "user":
            st.markdown(f"🧑 **You:** {msg['text']}")
        else:
            st.markdown(f"🤖 **Agent:** {msg['text']}")

    if st.session_state.last_answer and not st.session_state.processing:
        st.success("✅ Done")


st.markdown("---")
st.caption("Powered by MindsDB • Threat Intelligence Automation")

Run the Streamlit application.

Run the application from your terminal with this command:

streamlit run app.py

This command will automatically open the KEV Insight Copilot application in your default web browser.

KEV Insight Copilot Overview

Once deployed, you can use KEV Insight Copilot by simply typing questions like:

“Show recently exploited vulnerabilities impacting Cisco routers.”

“Which exploited CVEs are related to web applications using Apache?”

“Which Cisco vulnerabilities are active exploitation?”

The chatbot will automatically query the KEV Knowledge Base, perform semantic and metadata searches, and summarize the information in a human-readable format.

You can chat with KEV Insight Copilot about Known Exploited Vulnerabilities. You can query specific CVEs, filter by vendor or product, and check recent additions to the KEV catalog. This helps your team stay informed and respond quickly to threats.

Conclusion

Building the KEV Insight Copilot was an exciting and meaningful experience, especially in seeing how AI can simplify vulnerability intelligence workflows. By allowing natural language interaction with MindsDB, you can easily explore KEV data, prioritize threats, and stay updated on newly exploited vulnerabilities—all without changing tools or manually searching databases.

I hope this article has been valuable and enjoyable to read. If you notice any issues or have suggestions for improvements, I would greatly appreciate your feedback. Thank you for taking the time to explore this project, and feel free to reach out if you’d like to discuss more enhancements or future development ideas.

Special thanks to: MindsDB