Get GenAI guide

Access HaxiTAG GenAI research content, trends and predictions.

Showing posts with label BIRD benchmark ranking. Show all posts
Showing posts with label BIRD benchmark ranking. Show all posts

Sunday, August 25, 2024

IBM's Text-to-SQL Generator: How Generative AI is Revolutionizing Enterprise Data Insights and Queries

IBM recently launched a text-to-SQL generator that has made significant strides in handling complex database queries, ranking first on the BIRD benchmark. This solution, based on IBM's Granite code model, is part of IBM's broader effort to integrate generative AI into data services to help enterprises extract fresh insights from large databases.

As the volume of enterprise data surges—from website clicks to sales reports—companies are collecting and storing more data than ever before. However, the tools for searching across databases, data warehouses, and data lakehouses, and transforming this information into useful insights, have not kept pace with the data's growth. Many companies fail to fully utilize their data because employees either can't find the information they need or can't translate their questions into the code required to unlock the answers.

Generative AI is poised to simplify this process. Large language models (LLMs) are removing key barriers that currently make it difficult to search, retrieve, and transform tabular data. SQL is the dominant language for interacting with databases, yet within any given enterprise, only a limited number of individuals understand how large databases are structured and can query them in SQL. This effectively restricts who can access the data to uncover insights that could improve business operations.

To make enterprise data more accessible to a broader range of users, IBM and other tech companies have focused on teaching LLMs to write SQL. In a recent milestone, IBM's Granite code model topped the BIRD leaderboard, which measures how well LLMs can parse a natural language question and translate it into SQL to run on real data and answer the question.

IBM's text-to-SQL generator still has a long way to go. Despite being the top performer on BIRD, it answered only 68% of questions correctly, compared to the 93% accuracy achieved by engineers who participated in the test. However, considering the rapid progress LLMs have made in other programming tasks, such as refactoring COBOL code into Java, the gap between AI and human-generated SQL may soon narrow.

In BIRD's benchmark for code execution speed—measuring the computational resources required to run the AI-generated SQL against the database—BIRD evaluators scored IBM's solution at 80, just below the 90 scored by volunteer engineers, while other AI systems scored 65.

IBM's SQL code generator is just one of several technologies that IBM researchers are developing to help enterprises find, retrieve, transform, and visualize their data. IBM has already rolled out other LLM-powered components that enrich structured data with descriptions and business terminology, making database tables and columns easier to locate. These technologies were recently integrated into IBM's Knowledge Catalog and watsonx.data products.

“We're on a mission to drive AI into the entire data services pipeline,” said Lisa Amini, a research director at IBM who led the team developing the data enrichment technologies and SQL generator. “The features we're developing can help data stewards and engineers be more productive, and enable data and business analysts to reach insights faster.”

IBM researchers have designed a conversational graphical user interface (CGUI) that allows data engineers, stewards, and analysts to interact with their data through conversation. The CGUI combines the personal touch of an AI chat interface with the intuitive nature of a web-based GUI, helping users more easily interact with structured data and explore results.

In conclusion, IBM's text-to-SQL generator and its underlying Granite code model bring innovation to enterprise data services, enabling companies to more effectively extract valuable insights from vast amounts of data. This not only enhances data analysis efficiency but also opens up new avenues for non-technical users to access data. With IBM's continued innovation in generative AI and LLMs, we can expect even more powerful tools for data interaction and analysis, further driving transformation in enterprise data utilization.

Related Articles