Anthropic, an artificial intelligence (AI) and “public benefit” company, launched Claude 2 on July 11, marking another milestone in a year full of seemingly nonstop progress from the burgeoning generative AI sector.
Introducing Claude 2! Our latest model has improved performance in coding, math and reasoning. It can produce longer responses, and is available in a new public-facing beta website at https://t.co/uLbS2JNczH in the US and UK. pic.twitter.com/jSkvbXnqLd
— Anthropic (@AnthropicAI) July 11, 2023
According to a company blog post, Claude 2 shows improvements across nearly every measurable category. Perhaps most noteworthy among the differences between it and its predecessor is how the researchers discuss their work.
There’s no mention of traditional machine learning benchmarking or computational scores against similar models in the blog post announcing Claude 2. Instead, Anthropic tested both Claude and Claude 2 head-to-head on numerous tests meant to represent real-world knowledge, skills, and problem solving tests.
Claude 2 beat its predecessor across the board on knowledge, coding, and other exams and, according to Anthropic, even scores well against human averages:
“When compared to college students applying to graduate school, Claude 2 scores above the 90th percentile on the GRE reading and writing exams, and similarly to the median applicant on quantitative reasoning.”
It is worth noting that many experts believe comparisons between human and AI test takers are inefficacious due to the nature of human cognitive reasoning and the likelihood that a large language model’s training dataset contains test information. Essentially, tests designed for humans may not actually “test” an AI’s ability to reason or provide a proper demonstration of actual knowledge or skill.
Along with the launch of Claude 2, Anthropic debuted a beta version of a web-based “Talk to Claude” interface providing general access to the chatbot for users in the U.S. and U.K. areas.
Related: How to land a high-paying job as an AI prompt engineer
Cryptox conducted brief testing of the new version and, anecdotally speaking, the improvements are immediately noticeable. Claude 2 responded to our prompts near-instantly with clear, concise answers.
According to Anthropic, the new model’s prompt limit is 100 thousand tokens, or about the equivalent of 75,000 words. The site’s user interface indicates that users can upload PDF, TXT, CSV, and similar documents for parsing, however this functionality did not work in our limited testing prior to publishing this article.