Greetings fellow AI enthusiasts!
Another thrilling week has unfolded in the ever-evolving domain of artificial intelligence. Our newsletter is packed with the latest news and developments from the AI landscape.
In this week’s AI news, ByteDance is developing its own competitor using OpenAI’s technology. Mistral has introduced a new open-source AI model, Mixtral 8x7B. Microsoft Research has unveiled a 2.7 billion-parameter SLM, Phi-2. Alibaba has a new method for high-quality image-to-video synthesis, I2VGen-XL. Lastly and many more.. Stay tuned and subscribe for more updates in the exciting world of AI!
💥 ByteDance is secretly using OpenAI’s tech to build its competitor
🤖 Mistral unveils a breakthrough open-source AI model: Mixtral 8x7B
🧨 Microsoft Research has unveiled Phi-2, a 2.7 billion-parameter SLM
🖼️ Stable Zero123: A new model for generating novel views of 3D objects
🧠 OpenAI is forming a new team to bring ‘superintelligent’ AI under control
📹 Alibaba’s a new method for high-quality image-to-video synthesis: I2VGen-XL
☁️ Google Cloud introduces MedLM, a family of AI models for healthcare
⚙️ Midjourney Alpha: A New Way to Generate Images with AI
👓 Meta’s Ray-Ban smart glasses can now answer questions based on what you see
🧸 Curio launch Grok, an AI toy that can talk to kids
🤝 Axel Springer and OpenAI partner to boost journalism with AI
And more!
ByteDance is secretly using OpenAI’s tech to build its competitor
ByteDance, the parent company of TikTok, has been caught red-handed using OpenAI's and Microsoft's proprietary AI technology to develop its own competing large language model, Project Seed. This violation of terms of service suggests that ByteDance is under immense pressure to catch up in the generative AI race. In response, OpenAI has suspended ByteDance's access, and Microsoft has stated that it will enforce its code of conduct to address such misuse.
The details:
Project Seed is a high-priority, secretive initiative inside ByteDance that started about a year ago. Employees working on it have to sign separate nondisclosure agreements, and information access within the project has become increasingly siloed over time. Zhang Yiming, ByteDance’s billionaire co-founder and former CEO, keeps close tabs on its progress.
OpenAI’s terms of service state that its model output can’t be used “to develop any artificial intelligence models that compete with our products and services.” Microsoft, which ByteDance is buying its OpenAI access through, has the same policy. However, internal ByteDance documents show that the OpenAI API has been relied on to develop Project Seed during nearly every phase of development, including for training and evaluating the model.
Employees involved are well aware of the implications; conversations on Lark, ByteDance’s internal communication platform for employees, show that they have discussed how to “whitewash” the evidence through “data desensitization.” The misuse is so rampant that Project Seed employees regularly hit their max allowance for API access.
A few months ago, ByteDance ordered the team to stop using GPT-generated text in “any stage of model development,” the internal documents show. It was around this time that the company gained regulatory approval in China to release Project Seed through a chatbot platform called Doubao. However, the API continues to be used in ways that violate OpenAI’s and Microsoft’s terms of service, including for evaluating the performance of ByteDance’s model behind Doubao.
After this story was published, OpenAI spokesperson Niko Felix sent a statement confirming that ByteDance’s account has been suspended: “All API customers must adhere to our usage policies to ensure that our technology is used for good. While ByteDance’s use of our API was minimal, we have suspended their account while we further investigate. If we discover that their usage doesn’t follow these policies, we will ask them to make necessary changes or terminate their account.”
Why it’s important:
ByteDance’s secret use of OpenAI’s API to build a competitor shows how fierce the competition is in the generative AI space, and how some companies are willing to cut corners and violate ethical norms to gain an edge. It also raises questions about the accountability and responsibility of the AI providers and users, and the potential risks and harms of misusing such powerful technology. As generative AI becomes more accessible and ubiquitous, it is crucial to establish clear and enforceable guidelines and regulations to ensure its safe and beneficial use.
Mistral unveils a breakthrough open-source AI model: Mixtral 8x7B
Mistral AI, a company that aims to deliver the best open models to the developer community, has released Mixtral 8x7B, a high-quality sparse mixture-of-experts model (SMoE) with open weights. Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs.
Mixtral is a decoder-only model that uses a fraction of the total parameters per token, thanks to a router network that chooses two experts from a set of eight to process each token. Mixtral has 46.7B total parameters but only uses 12.9B parameters per token. It is pre-trained on data extracted from the open Web and can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.
The details:
Mixtral matches or outperforms Llama 2 70B, as well as GPT3.5, on most standard benchmarks, such as LAMBADA, PIQA, and Winogrande.
Mixtral is more truthful and less biased than Llama 2, according to the TruthfulQA/BBQ/BOLD benchmarks.
Mixtral masters five languages: French, German, Spanish, Italian, and English.
Mixtral can be gracefully prompted to ban some outputs from constructing applications that require a strong level of moderation.
Mixtral can be run with a fully open-source stack, thanks to the vLLM project and the Skypilot framework.
Why it’s important:
Mixtral is a remarkable achievement for the field of AI, as it demonstrates the power and efficiency of sparse mixture-of-experts networks. By making Mixtral 8x7B freely accessible, Mistral is not only democratizing AI but also showcasing the growing prowess of the European AI ecosystem. This open-source initiative aligns with the EU's ambition to establish itself as a global leader in AI research and development, fostering innovation and driving economic growth across the continent.
Microsoft Research has unveiled Phi-2, a 2.7 billion-parameter SLM
Microsoft Research has unveiled new SLM Phi-2, a 2.7 billion-parameter language model that demonstrates outstanding reasoning and language understanding capabilities, surpassing the performance of larger models on various benchmarks.
The details:
Phi-2 is the latest addition to Microsoft’s suite of small language models (SLMs), following Phi-1 and Phi-1.5.
Phi-2 is trained on a mixture of synthetic and web datasets, curated to teach the model general knowledge, science, daily activities, and theory of mind, among others.
Phi-2 outperforms Mistral’s 7B model and Llama-2’s 13B model on tasks such as Big Bench Hard, math, coding, and language understanding. It also matches or beats Google’s Gemini Nano 2, despite being smaller in size.
Phi-2 is available in the Azure AI Studio model catalog, where researchers can explore its capabilities and experiment with fine-tuning, interpretability, and safety improvements.
Phi-2 is a base model that has not undergone alignment through reinforcement learning from human feedback (RLHF), nor has it been instruct fine-tuned. However, it shows better behavior with respect to toxicity and bias compared to existing models that went through alignment.
Why it’s important:
Why it’s important: Phi-2 challenges the conventional wisdom that language models need to be massive in order to achieve high performance and emergent abilities. By using strategic choices for training data and model scaling, Phi-2 achieves remarkable results with a compact size, making it more accessible and efficient for researchers and developers. Phi-2 also opens new avenues for investigating the mechanisms and limitations of language models, as well as their potential applications and impacts.
Stable Zero123: A new model for generating novel views of 3D objects
Stability AI, a company that specializes in 3D object generation, has released Stable Zero123, a new model for generating novel views of 3D objects with improved quality. The model is based on Stable Diffusion 1.5 and uses an improved dataset and elevation conditioning for higher-quality predictions. The model is released for non-commercial and research purposes and can be used to create 3D objects from images.
The details:
Stable Zero123 is trained on a heavily filtered dataset from Objaverse, which contains high quality 3D objects rendered more realistically than previous methods.
The model uses elevation conditioning, which provides the model with an estimated camera angle during training and inference, to make more informed predictions.
The model uses a pre-computed dataset and an improved dataloader to achieve a 40X speed-up in training efficiency compared to Zero123-XL, the previous state-of-the-art model.
The model can be used with the improved open-source code of threestudio to create 3D objects from images using Score Distillation Sampling (SDS), a technique that optimizes a NeRF using the Stable Zero123 model.
Why it’s important:
Stable Zero123 is a significant advancement in the field of 3D object generation, as it demonstrates a better understanding of the object’s appearance from various angles and produces higher quality results. The model can enable open research in 3D object generation and text-to-3D generation, as well as potential applications in gaming, entertainment, education, and more.
OpenAI is forming a new team to bring ‘superintelligent’ AI under control
OpenAI, a leading AI research company, announced that it is creating a new team dedicated to solving the problem of superintelligence alignment. This is the challenge of ensuring that AI systems much smarter than humans follow human intent and do not harm humanity. They also launched the program that will grant a fund of $10M USD to support technical research towards the alignment and safety of superhuman AI systems, including weak-to-strong generalization, interpretability, scalable oversight, and more.
The details:
OpenAI launched program that will grant $10M USD for research to support technical research towards the alignment and safety of superhuman AI systems, including weak-to-strong generalization, interpretability, scalable oversight.The new team, called Superalignment, will be co-led by Ilya Sutskever, OpenAI’s chief scientist and one of its co-founders, and Jan Leike, a lead on the alignment team at OpenAI.
The team will have access to 20% of the compute that OpenAI has secured to date, and will aim to solve the core technical challenges of controlling superintelligent AI within four years.
The team’s approach is to build a roughly human-level automated alignment researcher, which can then use vast amounts of compute to scale its efforts and iteratively align superintelligence.
The team will leverage techniques such as scalable oversight, automated interpretability, robustness, and adversarial testing to train, validate, and stress test their alignment pipeline.
The team is looking for excellent ML researchers and engineers to join them and work on this crucial problem.
Why it’s important:
Superintelligence, or AI with intelligence exceeding that of humans, could be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But it could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction. OpenAI believes that superintelligence could arrive this decade, and that we need new scientific and technical breakthroughs to steer and control it. By forming a new team and dedicating significant resources to this problem, OpenAI is showing its commitment to ensuring that AI is aligned with human values and beneficial for humanity.
Alibaba’s a new method for high-quality image-to-video synthesis: I2VGen-XL
The details:
I2VGen-XL consists of two stages: the base stage and the refinement stage. The base stage guarantees coherent semantics and preserves content from input images by using two hierarchical encoders. The refinement stage enhances the video’s details by incorporating an additional brief text and improves the resolution to 1280x720.
To improve the diversity, the researchers collected around 35 million single-shot text-video pairs and 6 billion text-image pairs to optimize the model. The data covers a wide range of categories and styles, such as 2D culture, 3D cartoon, Chinese painting, and woodcut.
The researchers conducted extensive experiments to investigate the underlying principles of I2VGen-XL and compared it with current top methods, such as SVD and VideoComposer. They demonstrated that I2VGen-XL can simultaneously enhance the semantic accuracy, continuity of details, and clarity of generated videos.
The researchers also released the source code and models of I2VGen-XL on their GitHub. They also provided a number of video samples generated by I2VGen-XL from different text-image inputs.
Why it’s important:
Image-to-video synthesis is a challenging and valuable task that has many potential applications, such as video editing, animation, education, and entertainment. However, existing methods often suffer from low quality, semantic inconsistency, or lack of diversity. I2VGen-XL is a breakthrough that leverages the power of diffusion models and large-scale data to produce realistic and diverse videos from text-based images. It opens up new possibilities for video creation and manipulation.
Google Cloud introduces MedLM, a family of AI models for healthcare
Google Cloud has announced the launch of MedLM, a suite of foundation models fine-tuned for healthcare industry use cases. MedLM is built on Med-PaLM 2, a large language model that achieved expert-level scores on medical-licensing-exam-style questions. MedLM is now available to Google Cloud customers in the United States and in preview in certain other markets worldwide.
The details:
MedLM currently offers two models: a larger one for complex tasks and a medium one for scaling across tasks. The models can handle various applications such as answering medical questions, drafting summaries, and searching through medications.
MedLM is being used by several healthcare organizations to improve their solutions, such as HCA Healthcare for creating medical notes, BenchSci for accelerating drug development, Accenture for automating manual processes, and Deloitte for enhancing member engagement.
MedLM is part of Google’s broader effort to bring generative AI to healthcare, along with Med-PaLM 2 and Gemini, a multimodal model that can process text, images, and audio. Google is also collaborating with practitioners, researchers, and health and life science organizations to ensure a safe and responsible use of this technology.
Why it’s important:
MedLM is a promising example of how AI can help transform healthcare and medicine, by enabling professionals with faster, more accurate, and more scalable tools. MedLM also demonstrates the potential of foundation models, which can be adapted to different domains and tasks with minimal data and computation. By making MedLM available to more customers, Google hopes to spur innovation and value in the healthcare industry.
Midjourney Alpha: A New Way to Generate Images with AI
Midjourney, a popular image-generating AI service that has over 17.5 million users on Discord, has launched an alpha version of its website that allows users to create images directly on the web. The website has a simple and clean interface, with features such as prompt settings, image history, likes, and prompt search. The website is currently limited to users who have generated more than 10,000 images on Discord, but the company plans to make it available to more people soon. Midjourney is one of the leading AI image generation services, but it also faces legal challenges from artists and creators over its use of public imagery to train its models
Meta’s Ray-Ban smart glasses can now answer questions based on what you see
Meta, the company formerly known as Facebook, has launched an early access program for its Ray-Ban smart glasses that lets users ask questions based on what they are looking at. The glasses use a camera and microphones to capture and process the environment, and then provide contextual information using generative AI. For example, you can ask Meta AI what ingredients are in a plate of food, or what pants to wear with a shirt. The new feature is one of the most promising applications of multimodal AI, and it makes the Ray-Ban smart glasses even more useful and futuristic.
Curio launch Grok, an AI toy that can talk to kids
Grimes, the musician and mother of Elon Musk’s children, has teamed up with Curio, a Silicon Valley start-up, to create Grok, an AI-powered plush toy that can converse with your child. Grok is the first in a line of toys that use Open AI’s technology to enable long-running, interactive conversations. The toy is designed to increase imagination levels, reduce screen time, and provide an assistive technology for parenting. Grok is available for preorder now for $99 and will ship early next year.
Axel Springer and OpenAI partner to boost journalism with AI
Axel Springer, a leading media and technology company, has announced a global partnership with OpenAI, a research organization dedicated to creating artificial intelligence (AI) that can benefit humanity. The partnership aims to strengthen independent journalism in the age of AI by enriching users’ experience with ChatGPT, a conversational AI system powered by OpenAI’s technology. ChatGPT users will receive summaries of selected global news content from Axel Springer’s media brands, including POLITICO, BUSINESS INSIDER, and BILD. The partnership also supports Axel Springer’s existing AI-driven ventures and values the publisher’s role in contributing to OpenAI’s products. This marks a significant step in both companies’ commitment to leverage AI for enhancing content experiences and creating new financial opportunities that support a sustainable future for journalism.
Quick news
Google's GitHub Copilot rival, Duet AI, opens doors to developers (link)
Ashley: The AI-driven campaign caller revolutionizing political outreach (link)
AI empowered computing: Intel Core Ultra ushers in the AI PC era (link)
Instagram's creative tools get a GenAI-powered boost with background editing (link)
Pixie: Google's Gemini-Powered AI Assistant (link)
DeepMind AI outpaces human intellect: cracking a mathematical conundrum (link)
Midjourney V.6 amplifies creative control with natural language prompting (link)
Scientists poised to simulate human brain (link)
AI Newscasters Take the Stage (link)
Spotify Unveils 'AI Playlists' (link)
Be better with AI
In this section, we will provide you with comprehensive tutorials, practical tips, ingenious tricks, and insightful strategies for effectively employing a diverse range of AI tools.
How to Run ChatGPT-like LLMs Locally on Your Computer in 3 Easy Steps
Large Language Models (LLMs) are powerful tools for generating text and performing various natural language processing tasks. However, running them on your own computer can be challenging, especially if you don’t have an internet connection or a cloud service. Fortunately, there is a solution: llamafile, a tool that transforms LLM weights into executable binaries that can run locally on your Mac or Windows.
B In this tutorial, you will learn how to use llamafile to run Mistral open-source LLM.
To run these models locally, you only need to follow three easy steps:
Open terminal
Download the Mistral 7B llamafile for the model of your choice from Hugging Face. (link)
Or use this command in the Terminal:
curl -LO https://huggingface.co/jartine/llava-v1.5-7B-GGUF/resolve/main/mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
Make the binary executable using the Terminal or Command Prompt.
cd ~/Downloads chmod 755 mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
Run the executable and access the web UI on your browser.
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile
And access the web UI through http://127.0.0.1:8080/
That’s it! You can now enjoy the benefits of running LLMs locally on your computer, without any internet connection or cloud service. You can use these models for personal, development, or research purposes, and explore their capabilities and limitations.
Llamafile is a game-changer in the world of LLMs, making them more accessible and user-friendly. If you want to learn more about llamafile, LLaVA 1.5, or Mistral 7B, you can check out the original article by Paolo Perazzo on Products for Humans.
Tools
🖼️ Midjourney Prompt Helper - tool that simplifies the MidJourney prompts (link)
📑 Storyboard Hero - Effortless video concept and storyboarding (link)
🔗 Taplio - Leverage AI to grow on LinkedIn. (link)
⚒️ Carboncopy - Craft ad copy, product descriptions, images, audio, and more. (link)
🧑🦰 Heygen - Text into polished videos with AI avatars and voices. (link)
📱 Thanos - All-In-One AI blogging solution. (link)
📝 tl;dv - The AI meeting assistant take your meeting notes for you. (link)
🎨 Jasper.ai - Content generator that helps you and you break from creative blocks (link)
🔮 Plus AI - Ultimate AI presentation tool for Google Slides (link)
We hope you enjoy this newsletter!
Please feel free to share it with your friends and colleagues.
Thank you for reading!