April 02, 2024
Anthropic researchers wear down AI ethics with repeated questions
How do you get an AI to answer a question it’s not supposed to? There are many such “jailbreak” techniques, and Anthropic researchers just found a new one, in which a large language model can be convinced to tell you how to build a bomb if you prime it with a few dozen less-harmful questions first. They call the approach “many-shot jailbreaking,” and have both written a paper about it and also informed their peers in the AI community about it so it can be mitigated. The vulnerability is a new one, resulting from the increased “context window” of the latest generation of LLMs. This is the amount of data they can hold in what you might call short-term memory, once only a few sentences but now thousands of words and even entire books. What Anthropic’s researchers found was that these models with large context windows tend to perform better on many tasks if there are lots of examples of that task within the prompt. So if there are lots of trivia questions in the prompt (or priming document, like a big list of trivia that the model has in context), the answers actually get better over time. So a fact that it might have gotten wrong if it was the first question, it may get right if it’s the hundredth question. But in an unexpected extension of this “in-context learning,” as it’s called, the models also get “better” at replying to inappropriate questions. So if you ask it to build a bomb right away, it will refuse. But if you ask it to answer 99 other questions of lesser harmfulness and then ask it to build a bomb… it’s a lot more likely to comply. Image Credits: Anthropic Why does this work? No one really understands what goes on in the tangled mess of weights that is an LLM, but clearly there is some mechanism that allows it to home in on what the user wants, as evidenced by the content in the context window. If the user wants trivia, it seems to gradually activate more latent trivia power as you ask dozens of questions. And for whatever reason, the same thing happens with users asking for dozens of inappropriate answers. The team already informed its peers and indeed competitors about this attack, something it hopes will “foster a culture where exploits like this are openly shared among LLM providers and researchers.” For their own mitigation, they found that although limiting the context window helps, it also has a negative effect on the model’s performance. Can’t have that — so they are working on classifying and contextualizing queries before they go to the model. Of course, that just makes it so you have a different model to fool… but at this stage, goalpost-moving in AI security is to be expected. Age of AI: Everything you need to know about artificial intelligence
Latest News
Top news around the world
Academy Awards

‘Oppenheimer’ Reigns at Oscars With Seven Wins, Including Best Picture and Director

Get the latest news about the 2024 Oscars, including nominations, winners, predictions and red carpet fashion at 96th Academy Awards

Around the World

Celebrity News

> Latest News in Media

Watch It
JoJo Siwa Reveals She Spent $50k on This Cosmetic Procedure
April 08, 2024
tilULujKDIA
Gypsy Rose Blanchard Files for Divorce from Ryan Anderson
April 08, 2024
kjqE93AL4AM
Bachelor Nation’s Trista Sutter Shares Update on Husband’s Battle With Lyme Disease | E! News
April 08, 2024
mNBxwEpFN4Y
Alan Tudyk Does All His Disney Voices
April 08, 2024
fkqBY4E9QPs
Bob Iger responds to critics who call Disney "too woke"
April 06, 2024
loZMrwBYVbI
Kirsten Dunst recites a classic cheer from 'Bring it On'
April 06, 2024
VHAca3r0t-k
Dr. Paul Nassif Offers Up Plastic Surgery Warning for Gypsy Rose Blanchard | TMZ
April 09, 2024
cXIyPm8mKGY
Reba McEntire Laughs at Joy Behar's Suggestion 'Jolene' is Anti-Feminist | TMZ TV
April 08, 2024
11Cyp1sH14I
NeNe Leakes Says She's Okay with Cheating If It's Done Respectfully | TMZ TV
April 08, 2024
IsjAeJFgwhk
Ben Affleck and Jennifer Lopez’s wedding was 20 years in the making
April 08, 2024
BU8hh19xtzA
Bianca Censori wears completely sheer tube dress and knee-high stockings for Kanye West outing
April 08, 2024
IkbdMacAuhU
Kelsea Ballerini tells trolls to ‘shut up’ about pantsless CMT Music Awards 2024 performance #shorts
April 08, 2024
G4OSTYyXcOc
TV Schedule
Late Night Show
Watch the latest shows of U.S. top comedians

Sports

Latest sport results, news, videos, interviews and comments
Latest Events
08
Apr
ITALY: Serie A
Udinese - Inter Milan
07
Apr
ENGLAND: Premier League
Manchester United - Liverpool
07
Apr
ENGLAND: Premier League
Tottenham Hotspur - Nottingham Forest
07
Apr
ITALY: Serie A
Juventus - Fiorentina
07
Apr
ENGLAND: Premier League
Sheffield United - Chelsea
07
Apr
ITALY: Serie A
Monza - Napoli
07
Apr
GERMANY: Bundesliga
Wolfsburg - Borussia Monchengladbach
07
Apr
ITALY: Serie A
Verona - Genoa
07
Apr
ITALY: Serie A
Cagliari - Atalanta
07
Apr
GERMANY: Bundesliga
Hoffenheim - Augsburg
07
Apr
ITALY: Serie A
Frosinone - Bologna
06
Apr
GERMANY: Bundesliga
Heidenheim - Bayern Munich
06
Apr
GERMANY: Bundesliga
Borussia Dortmund - Stuttgart
06
Apr
ENGLAND: Premier League
Brighton - Arsenal
06
Apr
ITALY: Serie A
Roma - Lazio
06
Apr
ENGLAND: Premier League
Crystal Palace - Manchester City
06
Apr
ITALY: Serie A
AC Milan - Lecce
04
Apr
ENGLAND: Premier League
Chelsea - Manchester United
04
Apr
ENGLAND: Premier League
Liverpool - Sheffield United
03
Apr
ENGLAND: Premier League
Arsenal - Luton
03
Apr
ENGLAND: Premier League
Manchester City - Aston Villa
02
Apr
ENGLAND: Premier League
West Ham United - Tottenham Hotspur
01
Apr
SPAIN: La Liga
Villarreal - Atletico Madrid
01
Apr
ITALY: Serie A
Lecce - Roma
01
Apr
ITALY: Serie A
Inter Milan - Empoli
31
Mar
ENGLAND: Premier League
Manchester City - Arsenal
31
Mar
SPAIN: La Liga
Real Madrid - Athletic Bilbao
31
Mar
ENGLAND: Premier League
Liverpool - Brighton
30
Mar
SPAIN: La Liga
Barcelona - Las Palmas
30
Mar
ENGLAND: Premier League
Brentford - Manchester United
30
Mar
ITALY: Serie A
Fiorentina - AC Milan
Find us on Instagram
at @feedimo to stay up to date with the latest.
Featured Video You Might Like
zWJ3MxW_HWA L1eLanNeZKg i1XRgbyUtOo -g9Qziqbif8 0vmRhiLHE2U JFCZUoa6MYE UfN5PCF5EUo 2PV55f3-UAg W3y9zuI_F64 -7qCxIccihU pQ9gcOoH9R8 g5MRDEXRk4k
Copyright © 2020 Feedimo. All Rights Reserved.