Documents · Wittify Docs

Documents are the units inside a knowledge base. You drag-drop or browse to upload, watch the engine turn raw files into searchable chunks, then open any document for per-document scope, tags, description, and a chunk inspector. Each document goes through a five-state pipeline before it is searchable.

The ingestion pipeline

Every uploaded document goes through the same five states.

Status chip	What it means
Pending	Local placeholder while the file uploads.
Parsing	The engine is extracting text. PDFs use OCR for scanned pages, with Arabic optimisation.
Chunking	Text is split per the knowledge base’s chunk size and overlap settings.
Embedding	Each chunk is converted into a vector (covers 100+ languages).
Ready	The document is searchable in chats.
Failed	A server-side error occurred. The document detail page shows the exact error message.

The list refreshes automatically while a document is moving through the pipeline. You can leave the page and come back, the status flips on the next refresh.

Document detail page

Open any document for a tabbed view: Overview and SQL tables.

Overview tab

The default tab. Three cards stacked vertically.

Details card

Field	Notes
Size	Stored size in MB or KB.
Chunks	Number of chunks the document was split into.
Uploaded	The upload timestamp.
Status	The current pipeline state.
Ingestion error	When status is Failed, a red strip surfaces the server message.

Description card

Element	What it shows
Title	Description.
Subtitle	Short explanation of what this document contains. The assistant reads it when answering and when matching your questions to the right document.
Body	The current description text, or the empty state No description yet. Add one so the assistant can cite this document accurately.
Edit description button	Opens an inline editor with a Save and Cancel button.

Document scope card

A read-only summary of what the engine found in the document.

Axis	What it shows
Sheets	XLSX worksheet names.
Headings	Top-level headings extracted from the document.
Languages	Auto-detected language tags.
Contains tables	A Yes or No flag.

The scope filter on the chat composer uses these axes when you ask a question. So a heading like Section 4 - Refunds shows up as a one-click filter in the chat composer’s scope panel.

Tags card

Inline metadata editor for the document.

Element	Notes
Title	Tags.
Subtitle	Attach metadata to this document. The scope filter uses these keys when you ask a question.
Tag rows	Each tag is a `key` plus a `value`, displayed as small chips. Click any chip to edit, or click the + Add tag button to add a new one.
Key rules	Letters, digits, underscore, or Arabic characters. Maximum 64 characters.
Value rules	Free-form text up to 1,024 characters.

Common tag examples: team: hr, year: 2024, region: gulf, confidential: yes.

SQL tables tab

When the document is an XLSX file, every sheet is materialised into a Postgres table during ingestion so the chat assistant can write SQL against it. This tab lists those tables.

Element per table card	Notes
Header	The table’s name.
Copy FROM clause button	Copies a snippet like `FROM project_xx.table_yy` to your clipboard.
Description editor	Inline editor with placeholder Describe what this sheet contains (e.g. Monthly water reuse totals by region, 2017–2024).
Columns table	One row per column.

Column	What it shows
Original column	The header name as it appears in the XLSX file.
Postgres column	The sanitized column name the database actually uses.
Type	The inferred SQL type (text, integer, numeric, date, etc.).
Rows	The total row count, formatted rows.

When the document did not produce any SQL tables, the tab shows the empty state This document didn’t produce any SQL tables. Excel sheets are materialised into Postgres during ingestion; other formats are query-ready via RAG only.

Chunk inspector

A power-user view of how the assistant sees this document. Open any chunk row to enter the inspector.

Element	What it shows
Header	Chunks count, Edited count, Pinned count.
Search box	Placeholder Search chunks. Filters by content.
Each chunk card	Chunk index, page number, heading, plus small flags: OCR (came from optical character recognition), pinned, edited, boost (has keyword boosts), table (contains tabular content).
Edit chunk button	On every chunk card. Opens the editor.

Editing a chunk

Field	Notes
Chunk text	The raw extracted text. Editing this triggers a fresh embedding.
Keyword boost	Free-form list. Help text Synonyms the retriever will treat as if they appeared in the chunk.
Pin this chunk	Toggle. Help text Pinned chunks are always included in retrieval for this document.
Metadata	Read-only. Shows page number, heading, language, and any other auto-extracted properties.

When you click Save on a chunk-text edit, a confirmation dialog appears.

Element	What it says
Title	Re-embed this chunk?
Body	Saving the edited text triggers a fresh embedding (about 1 to 2 seconds). Citations pointing at this chunk keep working.
Cancel button	Closes the dialog without saving.
Save and re-embed button	Saves and re-embeds.

Keyword boost and pin do not trigger a re-embed. Only chunk-text edits do.

Bulk delete

In the document list (back on the KB detail page), select one or more rows to reveal the bulk toolbar. Click Delete to open a confirmation:

Element	What it says
Title	Delete selected documents?
Body	This will remove every selected document, its chunks, and its vectors. This cannot be undone.
Cancel button	Closes the dialog.
Delete button	Styled with the destructive (red) background and white text.

Common questions

My document is stuck in Parsing.

Files larger than 10 MB are rejected before they leave your browser. Within the cap, a file can stay in Parsing for a minute or two on first upload. Wait, then refresh the page. If it stays stuck, the engine likely failed to read the file (encrypted PDF, exotic format).

The OCR fallback returned gibberish on my Arabic scan.

OCR quality depends on scan quality. Re-export the PDF at a higher resolution, ideally 300 DPI or above, and re-upload. Avoid handwritten content, the engine is not trained on cursive.

I edited a chunk and the assistant still uses the old text.

Once you confirm Save and re-embed, the new text is live within 1 to 2 seconds. Older chats already in progress may still show citations to the old text until you ask a fresh question. Open a new chat to confirm.

Pinned chunks, are they always returned?

Yes for retrieval against this document. Pinned chunks bypass the scoring step and are always included in the retrieval set for this document. Use sparingly, pinning everything defeats the relevance ranking.

The Tags I added do not show up in the chat composer's scope filter.

The chat composer reads tag keys to populate the scope filter. After adding tags, refresh the chat page or start a new chat. The tags should appear in the filter dropdown.

The SQL tables tab shows nothing for my CSV file.

Only XLSX is materialised into Postgres tables today. CSV, JSON, TXT, and other text formats are query-ready via the document text only (RAG). To get SQL tables for a CSV, save it as an XLSX file first.

Can I delete a single chunk?

No. Chunks are owned by the document. Delete the document instead, or pin / edit chunks to nudge the assistant.

The chunk inspector is overwhelming.

You do not need to use it. The inspector exists for power users tuning retrieval. Most teams ship without ever opening a chunk. Tags and the description card cover the common cases.

Where to go next

Knowledge Bases

Back to the KB list.

Chats

Ask questions grounded in this document.

SQL Sources

For live database queries alongside documents.

Project Settings

See total storage and retrieval features.

Documentation Index

​The ingestion pipeline

​Document detail page

​Overview tab

​Details card

​Description card

​Document scope card

​Tags card

​SQL tables tab

​Chunk inspector

​Editing a chunk

​Bulk delete

​Common questions

​Where to go next

Knowledge Bases

Chats

SQL Sources

Project Settings

The ingestion pipeline

Document detail page

Overview tab

Details card

Description card

Document scope card

Tags card

SQL tables tab

Chunk inspector

Editing a chunk

Bulk delete

Common questions

Where to go next