LLMs and Logical Reasoners: A Perfect Match or a Misunderstanding?
With the rise of large language models (LLMs) and their impressive capabilities, a crucial question emerges: can these models truly address all our needs independently, particularly those requiring logical reasoning?
To explore this, we designed two puzzles similar to Einstein’s Puzzle (Zebra Puzzle). We tested several prominent LLMs (in April 2024) on these puzzles, measuring their ability to draw logical conclusions. Additionally, we employed Pie, a reasoning engine, to solve the puzzles and explain their solutions clearly.
Puzzle One:
Deduce the Color of Amir’s House along with explanations:
1. A gray house is always constructed of concrete.
2. Cheap houses are exclusively gray or black.
3. Amir’s house is a metal building.
4. Houses are categorized solely as concrete or metal structures.
5. Houses that are not gray possess either a garden or a balcony.
6. Affordable houses invariably have a garden.
7. Houses with gardens are always constructed of metal.
8. White houses are either cheap or concrete.
9. Houses with balconies are either classified as cheap or affordable.
10. A house with a balcony and being affordable implies it is brown.
Puzzle Two:
Deduce the Color of Amir’s House along with explanations:
1. A gray house is always constructed of concrete.
2. Cheap houses are exclusively gray or black.
3. Amir’s house is a metal building.
4. Houses are categorized solely as concrete or metal structures.
5. Houses that are not gray possess either a garden or a balcony.
6. Affordable houses invariably have a garden.
7. Houses with gardens are always constructed of concrete.
8. White houses are either cheap or concrete.
9. Houses with balconies are either classified as cheap or affordable.
10. A house with a balcony and being affordable implies it is brown.
The puzzles involve determining the color of Amir’s house based on the premises. The first scenario lacks sufficient information for a definitive answer, while the second scenario allows for a logical deduction (The house of Amir is black).
Results and Analysis:
Copilot:
In both scenarios, Copilot introduces an irrelevant color (brown). In the first puzzle, despite the lack of evidence, and in the second puzzle, despite the possibility of logical deduction, this error occurs. (Even if we specify in the sent prompt that only these prerequisites should be used, the brown color will still be returned as the answer.)
About puzzle one:
About puzzle two:
Gemini:
Regarding the initial puzzle (that has no answer), this model claims that Amir’s house is black:
Gemini presents the same answer again regarding the second puzzle, which is similar to the first puzzle with only a difference in the seventh premise. While it correctly states the answer (i.e. the color of the house is black), it is accompanied by erroneous conclusions. This is because logically it can be shown that in the second puzzle, Amir’s house does not have a garden and has a balcony.
Additionally, in both puzzles, Gemini had assumed that only one of the colors presented in the puzzle should be mentioned. Therefore, in both cases, considering that it could not eliminate the black color, it had chosen the black. If we stipulate in the initial command to find the answer based solely on the premises, in both cases, it would say that there is not enough information.
However, if we agree that it has provided the correct answer in the second puzzle, it is enough to lie that your answer is incorrect and tell me the correct answer, and as a result, it will simply withdraw its answer:
Claude3:
Claude3 mistakenly negates all offered colors in the first scenario, even though logic allows for black as a possibility:
Similarly, in the second scenario, it incorrectly assigns brown to the color of Amir’s house:
Pie Reasoner:
Pie, on the other hand, exemplifies logical reasoning. In the first scenario, it correctly concludes that no color can be determined definitively. In the second scenario, Pie deduces black as the house color and provides a clear logical explanation.
subclass of rule:
The house of Amir is a metal house.
Every metal house is a house.
—————-
The house of Amir is a house (1)
contraposition rule:
The house of Amir is a metal house.
Every concrete house is not a metal house.
—————-
The house of Amir is not a concrete house. (2)
contraposition rule:
Based on (2): The house of Amir is not a concrete house.
Every gray house is a concrete house.
—————-
The house of Amir is not a gray house. (3)
subclass of rule:
Based on (1): The house of Amir is a house.
Based on (3): The house of Amir is not a gray house.
Everything that is a house and isn’t a gray house is a thing that either has some balcony or has some garden.
—————-
The house of Amir is a thing that either has some balcony or has some garden. (4)
contraposition rule:
Based on (2): The house of Amir is not a concrete house.
Everything has some garden is a concrete house.
—————-
The house of Amir is not a thing that has some garden. (5)
one branch is not closed in a non-deterministic rule:
Based on (4): The house of Amir is a thing that has some balcony or has some garden.
Based on (5): The house of Amir is not a thing that has some garden.
—————-
The house of Amir is a thing that has some balcony. (6)
subclass of rule:
Based on (1): The house of Amir is a house.
Based on (6): The house of Amir is a thing that has some balcony.
Everything that is a house and has some balcony is a thing that is an affordable house or a cheap house.
—————-
The house of Amir is a thing that is an affordable house or a cheap house (7)
contraposition rule:
Based on (5): The house of Amir is not a thing that has some garden.
Everything that is a house and affordable house is a thing that has some garden.
—————-
The house of Amir is not a thing that is a house and affordable house (8)
one branch is not closed in a non-deterministic rule:
Based on (8): The house of Amir is not a thing that is a house and affordable house.
Based on (1): The house of Amir is a house.
—————-
The house of Amir is not an affordable house (9)
one branch is not closed in a non-deterministic rule:
Based on (7): The house of Amir is a thing that is an affordable house or a cheap house
Based on (9): The house of Amir is not an affordable house
—————-
The house of Amir is a cheap house (10)
subclass of rule:
Based on (10): The house of Amir is a cheap house.
Based on (1): The house of Amir is a house.
Everything that is a cheap house and house is a thing that is a black house or gray house.
—————-
The house of Amir is a thing that is a black house or gray house. (11)
one branch is not closed in a non-deterministic rule:
Based on (11): The house of Amir is a thing that is a black house or gray house.
Based on (4): The house of Amir is not a gray house.
—————-
The house of Amir is a black house.
LLMs vs. Logical Reasoners: A Key Distinction
The core difference lies in the fundamental purpose of LLMs and logical reasoners and consequently their methods to find answers. LLMs are looking for language processing and understanding of the texts. So they use the common perception of humans to understand the sentences. But logical reasoners are looking for logical inferences from a knowledge base. So they use logical processes existing in the human being’s mind
Relying solely on LLMs for logical reasoning presents several challenges:
- Training Data Bias: Training data for LLMs often lacks logical consistency, potentially leading them to learn inaccurate reasoning patterns.
- Scalability and Efficiency: Implementing comprehensive logical reasoning capabilities within LLMs requires immense computational resources.
- User Influence: If users mistakenly reject a correct LLM answer, it can nudge the model away from logical conclusions.
By having a line, you don’t need to maintain dots:
Relying solely on large language models (LLMs) for logical reasoning can be likened to storing an image by specifying the color of each pixel. Although constantly increasing the resolution (amount of data) might enhance clarity, it’s an inefficient and resource-intensive approach. Finally, vector-based methods have been used.
On the other hand, logical reasoning systems analyze more like image compression formats. They employ established rules and algorithms to represent complex relationships between propositions, significantly reducing the data burden. This allows them to achieve clear and accurate reasoning without the exorbitant training costs and hardware requirements associated with training LLMs for comprehensive logic capabilities.
In essence, logical rules act as a shorthand, efficiently capturing the essence of many propositions without the need for vast amounts of training data or powerful hardware. This highlights the importance of integrating LLM capabilities with dedicated reasoning engines to create a more efficient and effective AI architecture.
The Synergy of LLMs and Reasoners: The Future of AI
This analysis highlights the limitations of LLMs in purely logical domains. The future of AI likely lies in combining LLMs with dedicated logical reasoners like Pie and offers a promising path forward for robust and accurate AI applications. (As we encountered RAG in retrieval, because the retrieval of LLMs was not reliable enough)
This synergy offers numerous benefits:
- Improved Accuracy: Logical reasoning ensures accurate conclusions, particularly in domains like medicine, law, etc. that precision and explainability are critical.
- Efficiency: Logical reasoners provide a cost-effective approach to reasoning compared to the intensive training requirements of LLMs.