Beyond the Dashboard
Enabling Real-Time Conversations with Enterprise Data
Getting the answers you need, when you need them: It's been a long road
The desire to let business users converse with information sitting in technical systems by simply asking questions in natural language is not new. In fact, efforts have been underway to crack this nut for decades. One of the earliest efforts was Microsoft’s English Query which was introduced in 1998. 26 years and countless efforts later, we still have no reliable system that truly unlocks chatting with your data. Why is it so hard?
In this white paper, we will:
Outline the essential capabilities required to process natural language questions across multiple systems
Examine the common hurdles encountered when implementing these solutions
Introduce Clarista's innovative approach to overcoming these obstacles
Share early feedback and development directions for Clarista moving forward
Advent of Generative AI
Star Trek’s fans, do you remember the Universal Translator that was first aired in 1966?
Fast forward 58 years and science fiction has become reality. Recent demonstrations of frontier models, such as GPT 4-o and Claude Sonnet, have proven that real-time translation (including natural language to programming languages) is not only possible but actively being used in a wide variety of applications. Despite these advancements answering natural language questions that require information from enterprise systems and databases remains a very challenging problem. This problem has proven so complex that even the largest cloud data platform companies are publicly acknowledging its challenges. The challenge becomes even more daunting when we consider scenarios where the required data is distributed across multiple technical systems. This distribution adds layers of complexity to the already difficult task of converting natural language questions into database queries. To address this challenge, many solutions in the market have opted to limit the size and scope of data against which a user can ask a question. Examples of such approaches include asking questions against a single spreadsheet, a single table, or at best, a specific dashboard. At Clarista, we believe that this approach, while helpful in certain scenarios, provides only marginal improvements. The challenge of finding answers across multiple data sources remains a frontier in AI and data science that requires further innovation and development.
Clarista’s Mission
At Clarista's inception, we critically examined the potential and challenges of LLMs as a new data interaction technology. Starting with a clean slate, we pushed our thinking to its limits. Our resulting thesis for an ideal solution is as follows:
Everything starts with a question. All enterprise data efforts should be attributable to the quality and speed of the answers to the questions asked.
Context is everything. Answering the question is not sufficient. The solution needs to continuously learn and improve with usage.
Centralizing or copying data should never be required. Required data should be retrieved in real-time from multiple systems to reduce data movement.
Data should never be shared with 3rd party LLM models. The world of GenAI is new and we need to treat it as such by being risk-averse to data leakage.
Time to value must be measured in seconds and minutes, not months. Automate low-value tasks to enable focus on high-impact business problems that drive value.
Our charter and emphasis on user engagement ended up creating our mission statement:
Answer 80% of questions real-time and 20% within 24 hours.
Solution Approach – Start with Questions
At the core of our development process lies a commitment to user-centric design. Rather than starting with technology, we placed our focus squarely on end-users. Our team conducted extensive interviews with professionals from multiple industries. We asked them to share the questions they would pose if given unlimited access to their data. This approach yielded invaluable insights that shaped our solution design. Our research revealed that most business inquiries fall into three distinct categories:
The 'What' Questions: These questions seek facts. Example: "What is our customer retention rate, and how has it trended over the past year?"
The 'Why' Questions: These delve into reasoning and causality. Example: "Why are our cancer treatment product sales underperforming in the Midwest compared to other regions?"
The 'How' Questions: These address future actions and strategic planning. Example: "How can we improve participation in our malaria eradication program?"
Prioritizing User Needs
Based on our findings, we decided to focus initially on the 'What' questions, as they deal with existing data and past events. We also realized that the 'Why' questions often lead to a series of 'What' questions for diagnosis. This insight led us to incorporate a collaboration feature in our solution, allowing multiple colleagues to ask and share questions, culminating in a summary that seeks to address the overarching 'Why' question. We temporarily excluded forward-looking 'How' questions from our initial scope, as they cannot be answered solely based on historical data. Our roadmap includes addressing these questions in future iterations, starting with a recommendation approach once we have solidified our solution for the 'What' and 'Why' questions.
Additional Key Insights
Our research also revealed some other crucial insights:
Business-specific language: Each function within a business often uses its own terminology and acronyms, understood internally but confusing to outsiders.
Preference for self-service and collaboration: Enterprise users generally prefer to find answers independently or collaborate with colleagues for quick results, rather than submitting IT requests for all but the most complex analyses.
Information accessibility challenges: Most users need to use multiple enterprise systems to find and connect the answers they seek, which has led to many end-user utilities (Excel, Python Notebooks etc.) beyond enterprise systems.
Solution Design:
Guided by our mission and solution approach, we divided our solution into four interconnected planes:
The Context Intelligence Plane interprets user questions and produces a Clarista Query described in business terms.
The Semantic Plane translates the Clarista Query into multiple technical queries to pull the required data from underlying technical systems.
The Control Plane ensures enterprise readiness by addressing trust, transparency, and security. It provides data quality assessment, usage tracking, role-based access controls, and dynamic data masking.
The Analytics Plane enables traditional data science workloads. It offers an integrated SQL and Python workbench, visual orchestration of data transformers, and automation capabilities to enrich base data with useful metrics.
These four planes work in harmony to provide a comprehensive solution that addresses the challenges of natural language querying of enterprise data while ensuring security, governance, and advanced analytics capabilities.
Semantic Plane: The Foundation of Data Accessibility
The Semantic Plane is built on Clarista's innovative Semantic Data Fabric© technology. This unique technology allows us to meet our objectives without copying customer data or sharing it with LLM models, while retrieving real-time information from multiple data platforms. Key features of our Semantic Data Fabric include:
Multi-platform connectivity: Allows real-time data retrieval from both cloud and on-premises data platforms.
Data Organization: Publishes role-relevant data in an internal ‘Data Marketplace’ or ‘Data Product Catalog’, with Domains representing business functions and PODs (aka data products) representing platform independent semantic data sets such as Clients, Products, Orders, Financials etc.
AI-driven metadata creation: Automatically generates business-friendly metadata to represent underlying technical data.
Context Intelligence: Powering Natural Language Understanding
The Context Intelligence Plane drives the interpretation of user questions based on available business and data context. It handles the conversion of natural language queries into semantic data queries (Clarista Query), leveraging the metadata provided by the Semantic Data Fabric.
The Context Intelligence Plane consists of two key capabilities:
Clarista LLM Agents
Clarista's LLM Agents act as task coordinators, managing interactions between LLM models and other system components. They perform three critical functions:
Scoping: Identify the relevant metadata scope within Clarista to answer user questions.
Semantic Query: Compile semantic data queries (Clarista Queries) based on organizational context.
Interpretation: Describe the steps taken to create the Clarista Query based on user’s question.
Clarista Context Intelligence
The Clarista Context Intelligence develops and maintains organizational data context through three input types:
Meta-data: Business-friendly data definitions associated with technical data in enterprise systems.
User Interactions: Continuous learning from user queries and feedback.
Expert Feedback: Insights from data experts through a built-in verification workflow.
Bringing it together: Semantic + Context = Understanding
Here's a simple example of how the Context Intelligence and Semantic planes described above work together:
User asks: "What was our top-selling product in Q1, and how did its performance compare to the same period last year?"
Clarista LLM Agents work with Clarista Context Engine and LLM Model to interpret the question and ask the user to clarify if s/he needs the answer in terms of sales amount or number of units sold.
Based on the answer, the LLM Agents produce a Clarista Query, applying joins and filters and calculations on Clarista Data PODs.
The Semantic Plane converts the Clarista Query into multiple technical queries, across connected platforms.
Results are processed and presented to the user in an appropriate visual format, along with the explanation and supporting data.
This integrated approach provides users with intuitive access to their enterprise data without compromising on accuracy, speed or security. By leveraging the power of AI-driven metadata creation, context-aware query creation, and real-time data access, Clarista delivers on its promise of making enterprise data truly accessible and actionable for business users across the organization.
The Control Plane: Ensuring Enterprise Readiness
After developing the core system components to answer natural language questions, we shifted our focus to productization. We identified two additional solution spaces (planes) for making our product enterprise-ready and achieving our mission – Control Plane and Analytics Plane.
Recognizing the critical importance of trust, transparency, and data security in enterprise environments, we introduced a 'Control Plane' in Clarista to ensure:
Trust: Data quality assessment, automation, and audit
Transparency: Data catalog, usage tracking, and lineage
Security: Role-based access, auto-classifications, and dynamic data masking (e.g., PII masking)
The Analytics Plane: Unlocking Traditional Data Science Workloads
To address the limitations of LLMs in handling complex quantitative and probabilistic functions, we developed Clarista Lab, a data science and processing workbench. This enables customer data scientists and data engineers to enrich base data with useful metrics, which are then published into the Semantics Plane. Key capabilities include:
Integrated SQL and Python workbench for data science and engineering
Visual orchestration of drag-and-drop data transformers
Time-based and event-based automation of orchestrators
Monitoring and notifications
A Look Ahead
As we reflect on our mission to "Answer 80% of questions real-time and 20% within 24 hours," we're pleased with the progress made. Clarista has not only met this goal but has also expanded its utility beyond AI readiness, becoming a versatile tool for data governance, analytics, and cross-platform data integration. The success of Clarista demonstrates the immense potential of combining AI with enterprise data systems. We've shown that it's possible to create a solution that is both powerful and user-friendly, capable of handling complex queries while maintaining the highest standards of data security and governance. Future Directions:
Enhanced LLM Agents: Continue enhancing our LLM Agents so they can handle increasingly complex tasks and multi-step reasoning.
Follow-up Questions: Leveraging the intelligence plane to anticipate smart and relevant follow-up questions.
Predictive Analytics: Tighter integration between our GenAI and Traditional AI capabilities to not just answer "what" and "why" questions but also the "what might happen" scenarios.
Deeper Industry-Specific Solutions: Out of the box capabilities to address unique challenges in specific industries, from healthcare to finance and manufacturing.
Most excitingly, Clarista is advancing its capabilities to incorporate unstructured data analysis. This enhancement will enable multi-modal data analysis, combining insights from documents, PDFs and other unstructured sources with traditional structured data. This integration will facilitate more robust pattern recognition, cross-domain insights, and holistic decision support, pushing the boundaries of enterprise data analytics and fostering data-driven innovation across the organization.
Partner With Us
As Clarista continues to push the boundaries of enterprise data analytics, we invite you to join us in shaping the future of data-driven decision making. Experience the power of real-time, context-aware data conversations across your organization. Contact us today to schedule a demo and discover how Clarista can transform your enterprise data into actionable insights.