Triplex — a SOTA LLM for Knowledge Graph Construction
Triplex exceeds the performance of gpt-4o at knowledge graph construction for less than one-tenth the cost.
The Challenge of Knowledge Graph Construction
Knowledge graphs have become increasingly important in various fields, from search engines to recommendation systems. They excel at answering complex, relational queries that traditional search methods often struggle with. For instance, a knowledge graph can easily handle a query like "Provide a list of AI employees who attended technology schools."
However, constructing these graphs has traditionally been a complex and resource-intensive process. Recent approaches like Microsoft's GraphRAG, while promising, come with prohibitive costs. GraphRAG's procedure requires at least one generated output token for every ingested input token, making it impractical for most applications.
Enter Triplex:
Triplex aims to disrupt this paradigm by reducing the generation cost of knowledge graphs tenfold. Here's what makes Triplex stand out:
Efficiency: Triplex outperforms few-shot prompted GPT-4 at 1/60th the inference cost.
Size: Triplex is so small that it can run directly on your laptop, making knowledge graph construction accessible to a wider audience.
Open Source: Freely available on HuggingFace and ollama encouraging community engagement and further development.
Specialized Performance: Unlike general-purpose LLMs, Triplex is specifically trained for triple extraction, the cornerstone of knowledge graph construction.
Instructable: You can specify exact input entity types and relationships, ensuring high-fidelity output tailored to your needs.
How Triplex Works
Triplex efficiently converts unstructured text into "semantic triples" - the building blocks of knowledge graphs. These triples follow a (subject > predicate > object) format. For example:
Input: "Paris is the capital of France"
Output:
CITY: Paris > CAPITAL_OF > COUNTRY: France
CITY: Paris > LOCATED_IN > COUNTRY: France
And on a more complex example:
Input:: “Vincent van Gogh, a post-impressionist painter, created "The Starry Night" in 1889. This iconic artwork, with its swirling clouds, brilliant stars, and crescent moon, exemplifies the artist's unique style and emotional intensity. Van Gogh's bold use of color and expressive brushstrokes influenced many subsequent art movements, including Expressionism and Fauvism.”
ARTIST:Vincent van Gogh > BELONGS_TO_MOVEMENT > ART_MOVEMENT:post-impressionist
ARTWORK:The Starry Night > CREATED_BY > ARTIST:Vincent van Gogh
ARTIST:Vincent van Gogh > BELONGS_TO_MOVEMENT > ART_MOVEMENT:Expressionism
ARTIST:Vincent van Gogh > BELONGS_TO_MOVEMENT > ART_MOVEMENT:Fauvism
Performance Analysis
Our performance measurements revealed that Triplex significantly outperformed gpt-4o in cost and performance. This performance comes with a significant cost reduction due to Triplex's smaller model size and its ability to operate without the need for few-shot context.
Triplex was created by further training an SFT model with a preference-based dataset created from majority voting and topological sorting, using DPO and KTO. These additional training steps yielded substantial improvements in model performance. To accurately assess these nuanced enhancements, we conducted a rigorous evaluation using Claude-3.5 Sonnet. Our evaluation involved head-to-head comparisons between three models: triplex-base, triplex-kto, and GPT-4o. The results are presented in the table below:
Getting Started with Triplex
We've made it easy for you to explore and implement Triplex:
The Road Ahead
While Triplex is a step forward in knowledge graph construction, we recognize that many challenges remain, particularly in assembling coherent graphs from large datasets. Our team at SciPhi is committed to continuous improvement, working on enhancements to both the model and our knowledge graph construction pipeline in R2R.
As part of our commitment to advancing this technology, we're offering hands-on assistance to YC companies interested in trying out knowledge graphs with R2R. We'll work directly with you to ingest your data into a knowledge graph, free of charge.
Join the Knowledge Graph Revolution
Triplex represents more than just a new model; it's a step towards democratizing knowledge graph construction. By making this powerful tool open-source and efficient enough to run on a laptop, we're opening doors to innovations we've yet to imagine.
We invite you to download Triplex, experiment with it, and share your experiences. Together, we can push the boundaries of what's possible in knowledge representation and extraction.
For more information or to explore how Triplex can transform your data insights, reach out to us at founders@sciphi.ai. Let's shape the future of knowledge graphs together!
About SciPhi: We're passionate about making advanced AI technologies accessible and practical. Triplex is just the beginning of our journey to revolutionize how we interact with and understand complex information.