Gen Ai and data mapping

What aspects of data can we enhance with Genai? How are we able to improve data analytics, data screening and data lineage? In practice working with data involves working with multiple data sources, numerous data attributes and a variety of data names. To cope with complexity and to document data lineage mapping documents are used.

As mentioned by Atlan, a mapping document "defines how each source data element corresponds to a target data element". It "includes source-to-target mapping rules, transformation rules, data validation rules, business rules, data lineage, and relevant metadata". It is the cornerstone of a data analyst and changes in data streams should be reflected by the mapping.

OpenAi Assistents (basically an AI agent) help you to focus on specific content and a specific result. In our case the assistent should draft data mapping documents with the knowledge of a data analyst. We can train the assistent using example data: detailed mapping documents of databases we already have or general information about data analysis.

Fine-tuning and post-training

A Large Language Model (LLM) is a neural network trained on large datasets. Based on these datasets it determines different probability weights which can be used to generate output. The process of converting these datasets into weights is called pre-training. Pre-training has envolved rapidly in the past years that's why the LLMs are increasingly sufficticated and become more useful. It takes a very large capital investment to perform the training, because it uses a lot of computer power.

Although the model knows numerous specific facts, after a model has been trained you could see it as generic. It is not tailored for specific usecases or domains yet. Let's say for example you are a doctor and want to use the model for medical advisory reports, you would need to train the LLM with specific medical data. This additional trainin on top of the pre-training is called post-training or fine-tuning.

In our case it could be useful to post-train the model with specific business analysis documents and examples of mapping documents. We could also share details about our data infrastructure or our data model. This will support the model in determining context and will result in a more accurate output.

Gen AI chat vs Gen AI agents

Ever bought an airline ticket? If you did then you know the search for the best option can be tidious. There are multiple booking sites and all of them require the same input. It can take much of your time to book a holiday or business trip. Wouldn't it be perfect if this is done for you? Well this is where AI will come into the picture.

You have an advantage already because we ask the LLM for an analysis of the best flight options, but there are developments. The chat option retrieves information about our flight much more easily then google did before. However, there in the near future agents will be able to book your tickets automatically. You will ask agents to search for the best option and they will buy it after asking your permission. An AI agent is different from the chat functionality because it makes decisions and acts on these decisions. Gen Ai chat will only provide output and does not perform an action afterwards.

Although performing an action is the main criteria, agentic AI also identifies as AI that has longer term planning capabilities or has the ability to initiate actions on its own. As an example: the search functionality within a chatbot to show you todays weather forcast will therefore be a form of agentic ai, because it performs an action, but it does not have any long term planning capabilities.

If you want to read more about the topic, some references:

Atlan Team. (2023, December 22). What is Data Mapping? Steps, Techniques & More. Atlan.