ChatGPT and the foundational technology it is built upon, GPT-3, a large language model (LLM), will have a transformative impact on data and analytics. These technologies could be used to even further augment analytics workflows and democratize data access to new lines of data users. While the technology is certainly exciting and analytics providers are already discussing ways in which it will impact their roadmap, true transformational change will not happen overnight. The industry evolution with this technology will happen over the course of years.
There’s a lot of misunderstanding of what ChatGPT and LLMs are and what they are not. In this article, we will explain what ChatGPT is and provide ways in which ChatGPT and LLMs like GPT-3 (Generative Pre-trained Transformer 3) could be used to augment data and analytics in the future.
In This Post
What is ChatGPT, and what are LLMs?
ChatGPT, first introduced in November 2022, has become a household name in only a few short months. ChatGPT, an AI chatbot developed by OpenAI, is built on OpenAI’s GPT-3 family of LLMs. LLMs are foundational models that can read, summarize, predict, and generate text. I’ll let ChatGPT describe itself:
GPT-3 is an LLM that generates text using pre-trained algorithms. Pre-trained, in this case, means that GPT-3 has been trained on data necessary to complete its function. This is also an example of what is known as generative AI. OpenAI has fed GPT-3 information from a variety of sources, including Common Crawl’s dataset (a resource that crawls the web and freely archives web pages), Wikipedia, and select additional texts. ChatGPT is built upon a more advanced version of GPT-3 referred to as GPT-3.5. ChatGPT, and its underlying GPT-3.5 model, was trained on around 300 billion words in total from books, text on the web, Wikipedia, code libraries, and articles. The model was then fine-tuned with human feedback.
It’s important to keep in mind that ChatGPT is not truly artificial intelligence. ChatGPT cannot learn in a way that humans can, nor does it create any truly new content. Rather, ChatGPT recombines information it has been fed to answer specific questions it has been given. It runs through this task in a highly contextualized manner, allowing it to provide a satisfactory answer to the question it was posed. It does not reason or understand a task.
Some of the tech giants, including Microsoft and Google, have already announced plans to incorporate ChatGPT and OpenAI’s models into their offerings. Specifically, Microsoft made its Azure OpenAI Service available in January. Google has announced Bard, which it describes as an experimental conversational AI service built on its LaMDA LLM. Salesforce, Baidu, and Alibaba have also made announcements about their upcoming work with similar services leveraging generative AI and LLMs. Big tech’s moving in sync toward this exciting new paradigm shows how much potential they see in this underlying technology.
In addition, data and analytics companies have already begun announcing new related features in their roadmaps. There is tremendous opportunity to incorporate these tools into business intelligence and analytics.
Top 5 Ways GPT and LLMs Could Influence BI/Analytics
1. Analytics chatbots
The simplest way to imagine leveraging GPT in data and analytics is a chatbot. We’ve already seen a number of BI and analytics companies announce plans to integrate GPT as a means to conversationally ask questions of data and receive an answer in natural language. A chat interface leveraging GPT-3 could be used to translate natural language text into SQL queries to generate a query for business users who may lack the necessary knowledge to run the query themselves. A few BI and analytics companies are even going a step further by providing not only an answer to a question but also recommended visualizations, charts, and graphs to help explain the answer.
In addition to answering questions in a conversational manner, chatbots could also be used as a front-end interface for reviewing documentation in the future. Rather than having to use Google or a search engine within documentation, you could potentially imagine a world where you can ask a specific question to a GPT-enabled chatbot and receive a highly contextualized answer to your question.
2. Data storytelling
Business intelligence and analytics vendors have struggled since the dawn of the industry to translate data into consumable insights for non-technical stakeholders. LLMs may provide a path forward. As mentioned above, GPT could provide a front-end interface for conversational analytics where users can simply ask a question in a natural manner and receive an output in the form of a visualization. GPT could also provide a narrative around the insight, shortening the path from visualization to insight and helping to democratize data access for non-technical users.
Furthermore, the generative AI-based narratives may also open new lines of inquiry for data analysts. These narratives will help to augment data analysts’ existing workflows by simplifying the analysis process and opening the door to spend more time asking new and more meaningful questions of the data. Combining conversational analytics with automated visualization and narrative generation will provide substantial support for anyone who needs to extract insights from data.
3. Data preparation, enrichment, and cleansing
Another way to integrate GPT and LLMs into data and analytics platforms is having it assist with data preparation. GPT includes the ability to augment code development by simply having you ask related questions within the interface. One example is GRID’s formula assistant, which allows you to describe a formula in natural language and it then provides you with a formula, eliminating your need to memorize Excel or Google Sheets formulas.
More complex analytics projects could be translated from natural language to code by an LLM or GPT in the data preparation process. As mentioned, the ability to translate a conversational question into a SQL query could be used in the data pipelining process. LLMs could translate a question into a join statement automatically. Instead of writing scripts for data ingestion and transformation, data engineers might simply generate code via a conversation interface. In addition to SQL, LLMs could empower data analysts to run data science activities by providing a natural language interface to Python code generation. With the above examples of GPT’s capabilities, you can easily see how a user might be able to use these tools to create new data pipelines in a conversational manner.
4. Synthetic data generation
LLMs could aid with synthetic data generation to help train machine learning models. Synthetic data is artificially manufactured data that has no representation in the real world. This type of data is used for testing and training purposes in data science. Generative AI might be perfectly suited, based upon its initial intent, for this specific purpose. Because of the sensitive nature of data, many companies leverage synthetic data to avoid regulatory constraints. Often simpler to generate and use rather than real-world data, synthetic data can also lead to faster time to value for data science activities. With generative AI like GPT-3, companies could begin to more easily create synthetic data to assist with model generation, leading to more high-impact data science decision-making.
5. Truly next-generation self-service analytics
Taken in aggregate, the ways in which GPT could influence business intelligence and analytics are helping the world enter the next stage of self-service analytics. Over the course of the last two decades, data and analytics vendors have taken large strides toward making self-service achievable. From low-code to no-code and keyword search to natural language search, the journey to self-service has been a long march of progress. LLMs and GPT-like services provide the next step in self-service analytics by making the entire process of moving from data to insight achievable by a greatly expanded audience. We’ve reached the point where self-service analytics might actually be available to everyone in your organization.
In the near future, you will be able to open your business intelligence or analytics tool of choice to ask a question and receive a visualization with a text-based narrative explaining the key insight in seconds. This process will remove time-consuming analysis work and open self-service data access to even more members of your organization. The leap from last-generation self-service to this new paradigm will enable your organization to increasingly leverage data in meaningful ways.
So, ChatGPT, how do you think GPT will influence data and analytics?
Cautions and the Future of LLMs in Analytics
While the new technological developments around GPT and LLMs are exciting, you should also be cautious about how you employ the technology in the short term. It’s not going to put data analysts, engineers, and scientists out of work—there still needs to be humans reviewing the work done by these tools, as the underlying LLM can still make mistakes in its interpretation of the information provided. In addition, the foundational models like GPT are rather compute-intensive, which makes their everyday use impractical at this point. Humans also then need to make the decisions, as we’re still not at the point where GPT and LLMs should be making decisions based on the information they’ve generated.
Similarly to how organizations should use caution when employing LLMs and GPT in their processes, they should also avoid vendors that leverage GPT and LLMs in a bolt-on manner: e.g., legacy business intelligence and analytics vendors that are not built from the ground up to fully understand how to implement them in the context of their tool. As with previous technological advancements, you will see some vendors jump into a hyped new development with lackluster results, and immediately regress to tried and true methods. As an example, a few vendors in the business intelligence space tried natural language search a few years ago but had to fall back to keyword search when they failed to implement the technology successfully.
The next few years will be an exciting time for the technological development of LLMs. The incorporation of LLMs and ChatGPT-like interfaces into our daily lives is almost inevitable at this point. As with most revolutionary technologies, the hype is at its peak today, and the immediate trough of disillusionment is coming shortly.
It’s going to take some time before we see the technology incorporated into many of our workflows productively. And when it finally comes, it might not even be recognizable to the end-user as having been built with GPT or another LLM. Many of our tasks will be changed for the better in the long term—but maybe for the worse in the short term as businesses grapple with the proper application of the technology.
If you’d like to learn more about the plans Tellius has for GPT and LLMs, come visit our booth (#1324) at Gartner’s Data & Analytics Summit on March 20-22.