Welcome to the new DocsGPT 🦖 docs! 👋
Guides
📥 Training on docs

How to train on other documentation

This AI can use any documentation, but first it needs to be prepared for similarity search.

video-example-of-how-to-do-it

Start by going to /scripts/ folder

If you open this file you will see that it uses RST files from the folder to create a index.faiss and index.pkl.

It currently uses OPEN_AI to create vector store, so make sure your documentation is not too big. Pandas cost me around 3-4$

You can usually find documentation on github in docs/ folder for most open-source projects.

1. Find documentation in .rst/.md and create a folder with it in your scripts directory

Name it inputs/
Put all your .rst/.md files in there
The search is recursive, so you don't need to flatten them

If there are no .rst/.md files just convert whatever you find to txt and feed it. (don't forget to change the extension in script)

2. Create .env file in scripts/ folder

And write your OpenAI API key inside OPENAI_API_KEY=<your-api-key>

3. Run scripts/ingest.py

python ingest.py ingest

It will tell you how much it will cost

4. Move index.faiss and index.pkl generated in scripts/output to application/ folder.

5. Run web app

Once you run it will use new context that is relevant to your documentation Make sure you select default in the dropdown in the UI

Customisation

You can learn more about options while running ingest.py by running:

python ingest.py --help

Options
ingestRuns 'ingest' function converting documentation to to Faiss plus Index format
--dir TEXTList of paths to directory for index creation. E.g. --dir inputs --dir inputs2 [default: inputs]
--file TEXTFile paths to use (Optional; overrides directory) E.g. --files inputs/1.md --files inputs/2.md
--recursive / --no-recursiveWhether to recursively search in subdirectories [default: recursive]
--limit INTEGERMaximum number of files to read
--formats TEXTList of required extensions (list with .) Currently supported: .rst, .md, .pdf, .docx, .csv, .epub, .html [default: .rst, .md]
--exclude / --no-excludeWhether to exclude hidden files (dotfiles) [default: exclude]
-y, --yesWhether to skip price confirmation
--sample / --no-sampleWhether to output sample of the first 5 split documents. [default: no-sample]
--token-check / --no-token-checkWhether to group small documents and split large. Improves semantics. [default: token-check]
--min_tokens INTEGERMinimum number of tokens to not group. [default: 150]
--max_tokens INTEGERMaximum number of tokens to not split. [default: 2000]
convertCreates documentation in .md format from source code
--dir TEXTPath to a directory with source code. E.g. --dir inputs [default: inputs]
--formats TEXTSource code language from which to create documentation. Supports py, js and java. E.g. --formats py [default: py]