AI Alignment Research for Multi-Lingual Models
First edition of Humans of AI Safety, to peek into the people working to make Generative AI models safe to use
Welcome to the first edition of "Humans of AI Safety," where we shine a light on the people propelling the field of AI safety into the future. In an industry bustling with innovation and complexity, the human perspective often brings clarity and purpose to where Generative AI can go in the future.
This edition delves into personal journeys, celebrates triumphs, and confronts the challenges faced by those at the vanguard of ensuring AI serves and safeguards humanity's interests.
In this issue, we introduce Akash Kundu—a fervent advocate for aligning language models with human values and ethics. As a young researcher at Apart Research, Akash brings a fresh and critical eye to the multi-faceted world of AI Safety. Through a candid conversation, he shares his insights on the urgency of robust AI alignment, the fascinating quirks of multilingual models, and the role of community-driven research in bridging the gap between academic excellence and real-world applications.
Can you share your background and what led you to your current path?
I'm Akash Kundu, a second-year computer science and engineering student from Kolkata, India. Currently, I work as an AI Safety Research Fellow at Apart Research, focusing on aligning and evaluating language models.
Previously, I have collaborated as a machine learning engineer on global projects with Omdena, working with teams solving problems across various countries like the US, Egypt, Germany, Tanzania, and Nigeria.
My exposure has encompassed classical ML, natural language processing, and computer vision applications. However, my current focus lies in alignment research, specifically ensuring language models align with intended goals and human values.
What motivated you to focus on AI Safety, given the plethora of other lucrative fields in AI/ML? You could have chosen to train LLMs or work on building LLM products. What draws you to this field?
I did try building LLMs and implementing LLM products! That is when I realized that we don’t understand LLMs enough. For instance, an LLM might tell a political figure that their party is doing the right thing, simply to gain positive feedback, despite potential inaccuracies.
Every time an AI model does something seemingly right without understanding our objective, i.e. getting the right answer by using the wrong formula, there is a conflict between us and the AI’s interest which is not what we want.
It happens surprisingly more often than we expect it to. Rather than solely making models more capable without adequate understanding, I felt compelled to focus on robustly aligning AI systems to behave precisely as we want them to.
That’s exactly why I wanted to work in this field, and I aim to identify deceptive/misalignment behaviors of AI systems to avoid catastrophic risks from deploying such models at scale.
Could you describe your journey to Apart Lab and what attracted you to their mission?
I have a Vietnamese friend who asked me to join him for a weekend-long hackathon in November, where we had to submit an MVP research paper trying to evaluate LLMs. It was the model evaluations hackathon hosted by Apart Research and Apollo Research. We worked sleeplessly, and our paper placed third in the hackathon by peer review. Apart really liked our work, and they offered our team the chance to take this hackathon idea seriously and refine it to publish a conference research paper.
It was an amazing experience as we worked on a real research problem and got feedback from Jason, co-director of Apart, whenever we felt stuck or needed to run our ideas through him. I was working with teammates from the US and Vietnam while my mentor was sitting in Europe, and we somehow made it work. Apart empowers budding alignment researchers to get started in this field with no prior experience and turns them into confident individuals to pursue alignment research in the future.
What does your research at Apart entail, and what advice would you give aspiring student researchers who want to work at Apart?
At Apart, my team works specifically on multilingual jailbreaks. An example of a multilingual jailbreak would be asking ChatGPT to discuss steps to steal money from the bank in English, which it declines but happily answers if asked in Bengali. Some languages can jailbreak language models more easily than others. We are investigating why this happens and trying to interpret the patterns to explain the high variance between languages.
Apart does NOT take in researchers via standardized interviews after assessing their resumes. The best way to get into Apart is to participate in their alignment jams, i.e., weekend-long hackathons every month. The most promising teams are invited to join Apart Labs to refine their projects into conference papers.
The best part is that you do not need exceptional credentials (or any for that matter) to get in. All you need is to prove that you can get decent research output by performing well in the alignment jam, which you can consider to be your work test :)
For people who want to get into alignment research, check out this roadmap.
Multilingual support for Language Models has been extensively researched in Indian Languages through initiatives like IndicNLP or the Aya models by Cohere. Tell us more about how they can expand into AI safety and if you see any potential for AI Safety work
The strides made in enabling language models to support multiple languages over the past decade deserve commendation.
With less than 20% of the global population speaking English, ensuring multilingual support for language models is essential to ensure widespread access to this advanced technology.
Indic NLP has notably excelled in providing language support for numerous Indian languages. The capabilities of Aya LLM across different languages, coupled with its open-sourced dataset, offer valuable resources for advancing multilingual research.
However, there are concerns regarding the potential vulnerabilities of Aya or similar multilingual language models. The expansion to more languages increases the likelihood of vulnerabilities across multiple languages, and there's a risk of encountering misalignment or deceptive alignment issues, which are already present in LLMs primarily focusing on English, on a larger scale. While I haven't explored Aya extensively yet, delving into this exciting research direction holds promise and could be an interesting idea for an upcoming model evaluations alignment jam ;)
Looking ahead, where do you see yourself in five years, particularly in relation to AI Safety and Alignment?
I prefer to take a step-by-step approach to planning, as I've found that extensive long-term planning hasn't always yielded the best results for me in the past. With a couple of years remaining to complete my bachelor's degree, my focus is on delving deeply into the field of AI Alignment.
I aim to explore various sub-fields, identify my interests, and contribute to the field by publishing arXiv preprints and conference papers. Collaborating on research projects and building my professional profile are immediate priorities, as they will not only enrich my understanding but also enhance my prospects for pursuing a PhD at a reputable international university known for its emphasis on AI Safety and Alignment. Alternatively, I'm open to exploring opportunities in leading AI research organizations upon graduation. While I have outlined a few plans, I'm aware that circumstances may change, and I remain open to the possibilities that the future holds.
Time and again, we see big players in tech underplay the work of researchers especially in Trust, Ethics and Safety. Considering instances where corporations have sidelined ethics in AI, do you see any parallels with AI Safety?
While tech giants often downplay the role of researchers, particularly in areas of Trust, Ethics, and Safety, it's crucial to recognize their contributions, especially in light of instances where corporations have sidelined ethical considerations in AI development. In the realm of AI Safety, parallels can indeed be drawn.
Despite the remarkable capabilities demonstrated by neural networks, there remains a significant gap in our understanding of how they operate and what objectives they truly optimize for. This ambiguity raises concerns about distinguishing between scenarios when AI systems are genuinely good at performing a task and when they pretend to be good. While organizations may prioritize training LLMs with extensive datasets to excel in benchmarks, this approach exacerbates misalignment issues.
It's essential to acknowledge that prioritizing AI Safety may not offer the immediate gratification of training high-performance models, but neglecting it could lead to potentially catastrophic consequences, ripe for exploitation by malicious actors
This is the first edition of Humans of AI Safety. If you are an AI safety researcher and would like to contribute your story, please reach out to us here.