Aryn ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt Use this file to discover all available pages before exploring further.
export const PrimarySecondaryCTA = ({primaryLabel, primaryHref, primaryTarget, secondaryLabel, secondaryHref, secondaryTarget}) => {primaryLabel && primaryHref &&
{primaryLabel}
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg">
<path d="M9.70492 6L8.29492 7.41L12.8749 12L8.29492 16.59L9.70492 18L15.7049 12L9.70492 6Z" fill="white" />
</svg>
}{secondaryLabel && secondaryHref &&
{secondaryLabel}
</a>
}
;
Aryn is an AI-powered ETL system for complex, unstructured documents like PDFs, HTML, presentations, and more. It’s purpose-built for building RAG and GenAI applications, providing up to 6x better accuracy in chunking and extracting information from documents. This can lead to 30% better recall and 2x improvement in answer accuracy for real-world use cases.
Aryn’s ETL system has two components: Sycamore and the Aryn Partitioning Service. Sycamore is Aryn’s open source document processing engine, available as a Python library. It contains a set of transforms for information extraction, LLM-powered enrichment, data cleaning, creating vector embeddings, and loading Pinecone indexes.
The Aryn Partitioning Service is used as a first step in a Sycamore data processing pipeline, and it identifies and extracts parts of documents, like text, tables, images, and more. It uses a state-of-the-art vision segmentation AI model, trained on hundreds of thousands of human-annotated documents.
The Pinecone integration with Aryn enables developers to easily chunk documents, create vector embeddings, and load Pinecone with high-quality data.
<PrimarySecondaryCTA primaryHref={“https://sycamore.readthedocs.io/en/stable/sycamore/connectors/pinecone.html"} primaryLabel={“Get started”} primaryTarget={"_blank”} />