Data and AI newsletter: April 2023
Welcome to our April Data & AI newsletter!
In this edition, we have an array of news and insights to keep you up-to-date. We cover key highlights from the recently released 2023 AI Index Report by Stanford, which sheds light on the growing dominance of industry over academia in producing significant machine-learning models. We also introduce LMQL, a new open-source programming language that enhances the capabilities of language models like ChatGPT, GPT-4, and future models.
Some exciting updates on BloombergGPT, a large language model specifically designed for finance, and AWS's new open-source project, Data on EKS, which provides templates and guidance for deploying data workloads on Amazon Elastic Kubernetes Service.
And that's not all! Are you curious? Dive into the newsletter below.
NEWS AND INSIGHTS
“Until 2014, most significant machine learning models were released by academia. Since then, industry has taken over. In 2022, there were 32 significant industry-produced machine learning models compared to just three produced by academia. Building state-of-the-art AI systems increasingly requires large amounts of data, compute, and money, resources that industry actors inherently possess in greater amounts compared to nonprofits and academia.”
The AI Index is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry. The annual report tracks, collates, distills, and visualises data relating to artificial intelligence, enabling decision-makers to take meaningful action to advance AI responsibly and ethically with humans in mind.
LMQL is a ready-to-use, novel open-source programming language and platform for language model interaction (LLMs)! Combining prompts, constraints & scripting, LMQL elevates the capabilities of LLMs like ChatGPT, GPT-4 and any future model!
LMQL is a declarative, SQL-like language based on Python, extending static text prompting with control flow, constraint-guided decoding and tool augmentation. This form of scripting greatly simplifies multi-part prompting flows with very little code necessary. LMQL also supports high-level, logical constraints, allowing users to steer model generation, and avoid costly re-querying and validation.
Bloomberg has released BloombergGPT, a new large language model (LLM) that has been trained on enormous amounts of financial data and can help with a range of natural language processing (NLP) activities for the financial sector. BlooombergGPT is a cutting-edge AI that can evaluate financial data quickly to help with risk assessments, gauge financial sentiment, and possibly even automate accounting and auditing activities.
AWS has released Data on EKS (DoEKS), an open-source project providing templates, guidance, and best practices for deploying data workloads on Amazon Elastic Kubernetes Service (EKS). While the main focus is on running Apache Spark on Amazon EKS, blueprints also exist for other data workloads such as Ray, Apache Airflow, Argo Workflows, and Kubeflow.
INTERESTING AI STARTUPS AROUND
Look AI Ventures fund (LAIV) SICAV focuses on AI startups in pre-seed and seed stages from Europe, particularly the CEE region. The team aims to build the fund’s portfolio of at least 35 startups over the next three years. The targeted investment ticket in one startup is €250K, with the possibility of reinvesting up to €1 million.
“Digital First AI is a personal growth assistant that delivers personalized marketing strategy within minutes and helps its users to execute it using AI. Based on information about the business, Digital First then recommends a list of marketing activities that will allow companies to scale revenues.
Last year, the startup raised $1.1M in pre-seed funding for its global expansion, and is currently used by over 4,000 user companies and over 2,000 clients from 60 countries.” - Recursive writes.
💡 Blog article: From Deep to Long Learning
Researchers are working to increase sequence length in machine learning foundation models to enable learning from longer contexts and multiple media sources.
New models like S4, H3, and Hyena have been developed to address the quadratic scaling of attention layers in Transformers and show promising results in matching Transformers on perplexity and downstream tasks.
Despite the growing demand for interactive AI systems, there have been few comprehensive studies on human-AI interaction in visual understanding e.g. segmentation. Inspired by the development of prompt-based universal interfaces for LLMs, this paper presents SEEM, a promptable, interactive model for Segmenting Everything Everywhere all at once in an image.
The Institute for Computer Science, Artificial Intelligence and Technology (INSAIT) is launching a series of lectures on the most sizzling topics, such as Neurosymbolic AI, Generative AI, and Geometric Deep Learning, directly from the technology leaders, researchers and entrepreneurs who create them. The upcoming talk will be given by Prof. Martin Odersky, Inventor of the Scala Programming Language, in Sofia, on 27.04.2023. To attend it, register at https://techseries.insait.ai/.
Thanks for reading our monthly digest! If you enjoy it, we'd love your help spreading the word! Share it with friends and colleagues who might benefit from it.
The topics from the previous newsletters, you can find at: