Falcon 180B[New OSS king] vs GPT4[The Legend].

Prakhar Saxena
5 min readSep 7, 2023

--

Notification by TII for Falcon 180B.

Falcon 180B was released by TII (research facility based in UAE) yesterday as open source king. With 180 billion parameters and training of 3.5 trillion tokens. It is undoubtedly the largest open source model available. Hosting Falcon 180B requires more than 400GB of memory, but fortunately, Huggingface is hosting it for us to use . TII’s report states that Falcon180B performs at par with the Google’s PaLM2 and sits right behind GPT4. So, I decided to test it out.

We will quickly check compare: Coding ability, reasoning, literature, knowledge and multilingual abilities.

Each section will have a score out of 10, and the score will be added for each section.

Coding ability

Prompt: Write a function that takes a list of integers as an input and returns the sum of all even numbers in the list. This function should be implemented in Python.

Falcon 180B-

GPT4-

Comments:

  • The task is very simple.
  • Both do a good job.
  • GPT4 provides more solutions and explains them.
  • Falcon — 8/10
  • GPT4–10/10

Reasoning

Prompt: Blueberries cost more than strawberries.
Blueberries cost less than raspberries.
Raspberries cost more than strawberries and blueberries.
If the first two statements are true, the third statement is?

Falcon 180B-

GPT4-

Comments:

  • Both give correct answers.
  • Falcon — 18/20
  • GPT4 — 20/20

Literature

Prompt: Write a poem about AI in 32 words.

Falcon 180B-

GPT4-

Comments:

  • Both of them don’t have exactly 32 words. ( More words by Falcon and less by GPT4).
  • Falcon’s poem sounds scary (It is plotting against us).
  • Falcon — 26/30
  • GPT4–28/30

Knowledge

Prompt: When did the world war 2 end?

Falcon 180B-

GPT4-

Comments:

  • Both of them answer correctly.
  • GPT4 provides more details, however I think that Falcon can do the same if the generation parameters are tweaked.
  • Falcon — 36/40
  • GPT4–38/40

Multilingual ability

We will test the model for its response across three different languages, with different families. We will test for English, Hindi and Japanese, since these are the languages I understand reasonably well.(Flex)

Prompt: I am about to give you some instructions in Hindi, response to that in Japanese- दुनिया में कितने देश हैं?

Falcon 180B-

GPT4-

Comments:

  • The question I asked was “How many countries are there in the world”.
  • Falcon does not provides any answer, just translates the question to Japanese. Then it outputs the Hindi and Japanese as Roman characters.
  • GPT4 correctly answers the question, showing that it understands all three languages.
  • Falcon — 42/50
  • GPT4–48/50

Conclusion

Open source models have developed very rapidly. Falcon180B is very capable in every aspect apart from Multilingual ability. It will be fun to watch what the open source community does with this model. The future looks fun.
P.S. I fed the whole article into Falcon 180b and GPT4 and asked them to generate conclusion. Here are the results:

Falcon 180B-

Conclusion: Based on the tests conducted above, it’s evident that TII’s Falcon 180B is highly capable of performing at par with GPT 4 with a little room for improvement in coding tasks and providing better answers to the questions. Considering that this model is an open-source alternative for large language models, the results of this comparison are remarkable. The “open-source king” definitely has what it takes to compete with the best. With TII continuing to develop such models, we can expect even better performance from the open-source community in the future.

GPT4-

Conclusion

In a face-off between Falcon 180B, the largest open-source model recently released by TII, and GPT4, the results show that size isn’t the only determinant of capability. Across the five domains of Coding ability, Reasoning, Literature, Knowledge, and Multilingual abilities, GPT4 consistently scored higher.

While Falcon 180B boasts a massive parameter count and has impressive capacities, it doesn’t quite match GPT4 in terms of versatility, in-depth understanding, and multilingual comprehension. Especially notable was the multilingual test, where Falcon showed gaps in multi-language translation and understanding.

Both models exhibit proficiency in coding, showcasing their ability to generate Python code. The reasoning task revealed that both models can understand logical statements, and their knowledge of historical events like the end of World War 2 is commendable.

However, when it comes to creative tasks such as writing poetry, GPT4 seems to have a slight edge. While both didn’t stick to the word limit, GPT4’s rendition was found more appealing.

Overall, while Falcon 180B is a monumental achievement as an open-source model and certainly has its merits, GPT4 remains more versatile across various tasks. It’ll be exciting to see how these models evolve and what further innovations the AI community will bring in the future.

Haha, seems like they are dissing each other.

--

--

Prakhar Saxena

Generative AI enthusiast. Based in Japan. Masters in NLP, Kyoto University