Generating texts, research or answering general questions: There are many routine tasks in the day-to-day of public administrations where AI can provide efficiency gains. The CityLAB is exploring promising potential in joint workshops with the public administration of Berlin. There, our internal prototyping team analyzes concrete needs together with public servants, conceives GDPR-compliant digital solutions and tests directly with users.
Our project BärGPT is about how an AI assistant needs to be built to provide long-term a reduced workload for the employees. In this devlog series, CityLAB developers shine a light on technical aspects and functions they equipped the AI prototype with and share their expert knowledge. This time: Data scientist Malte Barth about how industry-specific knowledge can be conveyed to an AI chatbot.

How can we teach BärGPT administraion knowledge?
This question was our most recent focus while developing our AI chatbot. The popular opinion of “just” training an AI model with the necessary data was not an option for us because of the comparatively slow iteration speed in development as well as the enormous necessary computer resources. That’s why we decided for another way: the construction of public administration knowledge through retrieval augmented generation (RAG).
Why not fine-tuning?
Fine-tuning in machine learning stands for the the process of adjusting existing AI models to certain tasks or processes. However, this is quite time and resource intensive – the larger the base model, the more compute and training time is needed for fine-tuning. Additionally, the information used to train the model can’t be updated anymore from that point on except if the model is retrained with e.g. a newer version of the documents provided or entirely new documents. In this case however it can’t be ruled out that the model may “remember” old versions of the initial training data, leading to it generating wrong information. That’s why we decided against fine-tuning the open-source large language Model (LLM) by the European AI company Mistral we are using.
Expert knowledge through RAG
Retrieval augmented generation, in short RAG, is a method for extending the existing knowledge of an LLM with additional knowledge from a knowledge database. We already used RAG for the case that users upload documents to BärGPT and ask questions about them. This works technically by first parsing the content of the document with optical character recognition (OCR) and saving it in our database. This content is then split into smaller chunks and for them so-called vector embeddings get generated. These high-dimensional vectors can store semantic characteristics of a chunk and enable semantic search, leading to words or paragraphs with a similar meaning to a given a user query being found.
This has the advantage that the AI model always has the relevant data for the current user query available without having to be trained again. RAG also helps to ensure that the model has the same understanding of a certain document as the person that uploaded it.
We also wanted to use this function for public administration knowledge. This also needs to be updated or extended at regular intervals. Our test subjects understandably think that BärGPT already possesses this kind of knowledge since the AI assistant should be knowledgeable in the administrative context. That’s why they probably would not start by uploading general administration documents like the Joint Rules of Procedure of the Berlin Administration (GGO) and then adding it to the chat as is currently the case for their personal documents.
Base knowledge for all
This is why we chose another variation: In addition to the manually user-uploaded documents, there is now another class of documents we are calling base knowledge documents. These are uploaded by admins (currently our development team) and go through the same processing chain as the private documents. The difference is that no user is “owning” the document but instead it is owned by an access group. These groups can cover use cases where certain documents are only relevant for certain departments in the future.
We are starting with a single access group containing all users. Documents uploaded for this group are therefore relevant for all public servants and lead to BärGPT being able to correctly answer questions concerning the public administration of Berlin.


When does the model need base knowledge?
Not every user request to BärGPT requires base knowledge, for simple questions the detour through semantic document search can be skipped in favor of the LLM answering based own its own training knowledge. That’s why we designed the base knowledge search as a so-called “tool”. Tools are functions available to the AI model that it can “decide” to use or not use. This has the advantage that the model “decides” itself whether the question needs public administration information it didn’t have available during training to correctly answer or not. If there are queries concerning private documents, the tool is being turned off temporarily right now. We will do further user tests to verify whether the base knowledge is still needed in those cases e.g. to understand certain words or paragraphs.
BärGPT “learns” administration knowledge
In the upcoming weeks our service design team will once again talk to public servants to find out which documents BärGPT has to be able to know and if needed cite, for it to be used effectively in the public administration. As soon as we gather a verified list of documents, we will add them as base knowledge, so that questions concerning their contents can be answered up-to-date and as error-free as possible. In our next step we will also be gathering more particular documents containing knowledge only relevant for certain sub-departments or district offices.
In the following devlogs you will learn how things will develop with this feature and additional functions. You can find more information regarding BärGPT at our project page. Stay tuned!
