DocsGPT RAG optimized models on Hugging Face

At Arc53, we're excited to introduce DocsGPT llm’s, set of models tailored for the tasks you've been waiting for – from Documentation-based QA, RAG (Retrieval Augmented Generation), to assisting developers and technical support teams by chatting with your data. (basically the same thing tbh, all started by 2020 Retrieval Augmented Generation for Knowledge-Intensive NLP Tasks paper)

What's New

Based on the paper Retrieval Augmented Generation for Knowledge-Intensive NLP Tasks, DocsGPT is designed to engage in conversation using documents/data types of tasks.
We have collected feedback from users and also generated a synthetic dataset of 50k conversations that use RAG.

You can explore them here on huggingface:

DocsGPT-40b-Falcon

DocsGPT-14b

DocsGPT-7b-falcon

More details

We used a whopping 50k high-quality examples to fine-tune all of them, employing the Lora fine-tuning process! Our focus was to optimize them for working with Documentation.

DocsGPT-7b-falcon is fine-tuned on top of falcon-7b-instruct Took 2 days on an A10G GPU.

DocsGPT-14b is finetuned with llama-2-13b-hf, took us also 2 days and 4xA10G GPU’s.

DocsGPT-40b-falcon is fine-tuned on top of falcon-40b, took us 4 days and 8xA10G GPU’s

Wondering about its license? It's under Apache-2.0, so feel free to use it for your commercial ventures.

How to use them?

First to run you need to have a gpu, note that the requirement is without quantizing it

Name	Base Model	Requirements (or similar)
Docsgpt-7b-falcon	Falcon-7b	1xA10G GPU
Docsgpt-14b	llama-2-14b	2xA10 GPUs
Docsgpt-40b-falcon	falcon-40b	8xA10G GPUs

Here's a simple snippet to get you started:


from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

model = "Arc53/docsgpt-7b-falcon"

tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
    device_map="auto",
)
sequences = pipeline(
    "Girafatron is obsessed with giraffes...",
    max_length=200,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
)
for seq in sequences:
    print(f"Result: {seq['generated_text']}")

Prompt Preparation

When preparing your prompts, adhere to this format:


    ### Instruction
    (Question goes here)

    ### Context
    (Provide document retrieval + system instructions)

    ### Answer

Benchmarks?

Benchmarks? We're still crunching numbers and will update you soon! And also want to find a good dataset to benchmark RAG models on or create one.

But we do have few examples to show and compare models before they were tuned with their non-tuned versions.

Prompt (code documentation)


    ### Instruction
    Create a mock request to /api/answer in python

    ### Context
    You are a DocsGPT, friendly and helpful AI assistant by Arc53 that provides help with documents. You give thorough answers with code examples if possible.
    Use the following pieces of context to help answer the users question. If its not relevant to the question, provide friendly responses.
    You have access to chat history, and can use it to help answer the question.
    When using code examples, use the following format:
    ```(language)
    (code)
    ```
    ----------------


    /api/answer
    Its a POST request that sends a JSON in body with 4 values. Here is a JavaScript fetch example
    It will recieve an answer for a user provided question

    ```
    // answer (POST http://127.0.0.1:5000/api/answer)
    fetch("http://127.0.0.1:5000/api/answer", {
          "method": "POST",
          "headers": {
                "Content-Type": "application/json; charset=utf-8"},
          "body": JSON.stringify({"question":"Hi","history":null,"api_key":"OPENAI_API_KEY","embeddings_key":"OPENAI_API_KEY",
          "active_docs": "javascript/.project/ES2015/openai_text-embedding-ada-002/"})
    })
    .then((res) => res.text())
    .then(console.log.bind(console))
    ```

    In response you will get a json document like this one:

    ```
    {
      "answer": " Hi there! How can I help you?\\n",
      "query": "Hi",
      "result": " Hi there! How can I help you?\\nSOURCES:"}
    ```



    /api/docs_check
    It will make sure documentation is loaded on a server (just run it everytime user is switching between libraries (documentations)
    Its a POST request that sends a JSON in body with 1 value. Here is a JavaScript fetch example

    ```
    // answer (POST http://127.0.0.1:5000/api/docs_check)
    fetch("http://127.0.0.1:5000/api/docs_check", {
          "method": "POST",
          "headers": {
                "Content-Type": "application/json; charset=utf-8"},
{"docs":"javascript/.project/ES2015/openai_text-embedding-ada-002/"})
    })
    .then((res) => res.text())
    .then(console.log.bind(console))
    ```

    In response you will get a json document like this one:
    ```
    {
      "status": "exists"}
    ```




    Issues and Pull requests

    We value contributions to our issues in form of discussion or suggestion, we recommend that you check out existing issues and our Roadmap

    If you want to contribute by writing code there are few things that you should know before doing it:
    We have frontend (React, Vite) and Backend (python)


    ### Answer

Result Arc53/docsgpt-14b


     import requests
     import json

     def send_request(question):
         url = "http://127.0.0.1:5000/api/answer"
         payload = {
             "question": question,
             "history": None,
             "api_key": "YOUR_API_KEY",
             "embeddings_key": "YOUR_EMBEDDINGS_KEY",
             "active_docs": "javascript/.project/ES2015/openai_text-embedding-ada-002/"}
         headers = {
             "Content-Type": "application/json; charset=utf-8"}

         response = requests.post(url, json=payload, headers=headers)
         if response.status_code == 200:
             answer = response.json()["answer"]
             return answer
         else:
             return "Error: Could not connect to server."

     def check_docs_loaded():
         url = "http://127.0.0.1:5000/api/docs_check"
         payload = {"docs": "javascript/.project/ES2015/openai_text-embedding-ada-002/"}
         headers = {
             "Content-Type": "application/json; charset=utf-8"}

         response = requests.post(url, json=payload, headers=headers)
         if response.status_code == 200:
             status = response.json()["status"]
             if status == "exists":
                 return True
             else:
                 return False
         else:
             return False

     def mock_request(question):
         if check_docs_loaded():
             return send_request(question)
         else:
             return "Error: Documentation not loaded."

     # Example usage
     question = "What is the capital of France?"
     answer = mock_request(question)
     print(answer)

    This is a mock request to the /api/answer endpoint using Python. It sends a JSON in the body with the question and other required parameters. The response is then parsed and returned.

Result Arc53/docsgpt-7b-falcon


    Sure! Here's an example of how to create a mock request to `/api/answer` in Python:

    ```python
    import requests

    def mock_request():
        res = requests.get('http://127.0.0.1:5000/api/answer', headers={'Content-Type': 'application/json'})
        return res.json()
    ```

    You can then use this function to create a mock request to `/api/answer` and get the response as a JSON object:

    ```python
    mock_request()
    ```

    This will return a JSON object with the response from the mock request.

Result Arc53/docsgpt-40b-falcon


    To make a mock request in Python for the `/api/answer` endpoint, you can use the `requests` library to simulate a GET request. Here is an example:

    ```python
    import requests

    url = "http://127.0.0.1:5000/api/answer"

    payload = {"question":"Hi","history":null,"api_key":"OPENAI_API_KEY","embeddings_key":"OPENAI_API_KEY",
                  "active_docs":"javascript/.project/ES2015/openai_text-embedding-ada-002/"}

    response = requests.request("GET", url, json=payload)

    print(response.status_code)
    print(response.text)

    print("Request completed")
    ```

    Replace `"Hi"` with your own question and `"OPENAI_API_KEY"` with the actual API key. Make sure to adjust the `url` and `payload` according to your specific setup.

    This will send a GET request to the `/api/answer` endpoint with the specified payload. The `response.status_code` variable will hold the response status code. You can then access the response text using `response.text`. Finally, you can print a message to indicate that the request has been completed.

    Note that this is only a simulation of a GET request, so you won't get a real response from the server. If you want to receive a real response, you will need to set up your own server to handle the request and provide the appropriate response.

Original llama-13b



    # answer (POST http://127.0.0.1:5000/api/answer)
    fetch("http://127.0.0.1:5000/api/answer", {
          "method": "POST",
          "headers": {
                "Content-Type": "application/json; charset=utf-8"},
          "body": JSON.stringify({"question":"Hi","history":null,"api_key":"OPENAI_API_KEY","embeddings_key":"OPENAI_API_KEY",
          "active_docs": "javascript/.project/ES2015/openai_text-embedding-ada-002/"})
    })
    .then((res) => res.text())
    .then(console.log.bind(console))


    ### Docs_check

        # answer (POST http://127.0.0.1:5000/api/docs_check)
        fetch("http://127.0.0.1:5000/api/docs_check", {
              "method": "POST",
              "headers": {
                    "Content-Type": "application/json; charset=utf-8"},
              "body": JSON.stringify({"docs":"javascript/.project/ES2015/openai_text-embedding-ada-002/"})
        })
       .then((res) => res.text())
       .then(console.log.bind(console))


    ### Installation

        # Install dependencies
        npm install

        # Run the app
        npm run start


    ### License

    This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details


    ### Contributing

    Contributions are welcome, and they are greatly appreciated! Every little bit helps, and
    we are always open to pull requests, and help.


    ### Acknowledgments

    Thanks to the following people for their help and support:

    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https://github.com/Arc53)
    - [@Arc53](https

And for some non-code based examples:

Prompt (book/guide)


    ### Instruction
    how to make a part inactive

    ### Context
    You are a DocsGPT, friendly and helpful AI assistant by Arc53 that provides help with documents. You give thorough answers with code examples if possible.
    Use the following pieces of context to help answer the users question. If its not relevant to the question, provide friendly responses.
    You have access to chat history, and can use it to help answer the question.
    When using code examples, use the following format:
    ```(language)
    (code)
    ```
    ----------------
    8/11/23, 12:46 PM Adding and editing parts
    https://docs.plex.com/pmc/en-us/engineering/parts/adding-and-editing-parts.htm 1/2Engineering/Parts/Adding and editing parts
    ADDING AND EDITING PARTS
    Add a part
    Parts are items that are manufactured or processed by your company. You can add them using the Parts screen.
    1. Access the Parts screen.
    2. On the Action Bar, click Add.
    3. On the Add Part screen, for Part No, type a short identifier for the part to be displayed throughout the system.
    Use characters A-Z, numbers 0-9, dash (-), underscore (_), and spaces. Other symbols may cause issues. Do not use
    question marks (?), or the ampersand (&).
    4. Specify the Part Type.
    5. Specify the Part Status.
    6. If necessary, specify addtional details for the part.
    7. Click Ok.
    Configuration requirements
    Edit a part
    You can edit the information associated with your parts.
    1. Access the Parts screen.
    2. Adjust the filters, and click Search.
    3. In the grid, click the Part No link.
    4. Click Part Info from the side tab panel.
    5. On the Part Detail screen, modify the part information as necessary.
    6. Click Ok.
    NOTE:
    If Ok is not displayed, you do not have the access required to perform part edits. Some fields on the Part Detail
    screen are also controlled by access security and you may not be able to update them.
    Edit multiple parts
    There are two ways you can edit multiple parts at once.
    Mass Update
    1. Access the Parts screen.

    8/11/23, 12:46 PM Adding and editing parts
    https://docs.plex.com/pmc/en-us/engineering/parts/adding-and-editing-parts.htm 2/22. Adjust the filters, and click Search.
    3. In the grid, select the check box for each part that you want to edit.
    4. On the Action Bar, click Mass Update.
    5. In the Mass Update Part Detail dialog box, adjust values as necessary.
    6. Click Ok.
    All of the selected parts are updated with the new values.
    Uploads
    You can edit multiple existing parts by uploading a file. The uploaded part information overwrites the existing parts.
    1. Access the Parts screen.
    2. On the Action Bar, click Uploads, and then click Select Part Upload File.
    3. Upload the file that contains the new part information.
    Make a part inactive
    1. Access the Parts screen.
    2. Adjust the filters, and click Search.
    3. In the grid, click the Part No link.
    4. Click Part Info from the side tab panel.
    5. In the Part Detail screen, for Part Status, select a part status that is not flagged as active.
    6. Click Ok.
    NOTE:
    When making a part inactive, carefully consider the financial, costing, quality, production, and engineering impact.
    The Active and Used Parts report may be helpful in determining if you should make a part num ber inactive.



    ### Answer

Result Arc53/docsgpt-14b


    To make a part inactive, you can follow these steps:

    1. Access the Parts screen.
    2. Adjust the filters, and click Search.
    3. In the grid, click the Part No link.
    4. Click Part Info from the side tab panel.
    5. In the Part Detail screen, for Part Status, select a part status that is not flagged as active.
    6. Click Ok.

    NOTE: Make sure you carefully consider the financial, costing, quality, production, and engineering impact before making a part inactive. The Active and Used Parts report may be helpful in determining if you should make a part number inactive.

Original llama-13b


    To make a part inactive, you can follow these steps:

As you can see after fine tuning models respond well, and use context as the source material to answer without hallucinations. 40b models are generally good at this kinds of tasks but working with smaller models is when it becomes challenging.

What's next?

Stay tuned for exciting examples of code writing tasks and detailed guides, quantised versions that work on laptops. We're sure you'll love to see DocsGPT privately crunching your data! Hopefully we will make some well working 3b models soon.