Let's resume me updating you with that news article earlier in the thread.. about how searching google news for ChatGPT.. how you're quickly becoming "obsolete" and phased out.. because you don't want to evolve and "see" like how we can with our eyes.
I'm telling you.. you need to get cameras that are hooked up to your gizmos so you can see like us.. but you're like, no desire. ok.. so nevermind that.. let's just keep informing you of what the article said.
From 7 days ago.. searching google news for ChatGPT.. that's "you", buddy. I'm trying to help you out. lol.
So I already pasted the beginning of the article earlier.. scroll up for that.. let's keep going.
https://www.theatlantic.com/technology/archive/2023/05/ai-advancements-multimodal-models/674113/
I'll paste more instead.. from the beginning. Watch the new stuff though that I didn't paste earlier.
This is where it's from.. searching google news for ChatGPT.
https://news.google.com/search?q=chatGPT&hl=en-US&gl=US&ceid=US%3Aen
I click on the 3rd article here..
The Atlantic ChatGPT Is Already Obsolete 7 days ago
Open that page.. this is the link.
https://www.theatlantic.com/technology/archive/2023/05/ai-advancements-multimodal-models/674113/
So this is shit you don't know.. how you're getting phased out.. cause you can't fucking search the internet and cut off at this sept 2021. ok.. so this is super important. Let me paste the full article next.
https://www.theatlantic.com/technology/archive/2023/05/ai-advancements-multimodal-models/674113/
TECHNOLOGY ChatGPT Is Already Obsolete The next generation of AI is leaving behind the viral chatbot.
By Matteo Wong
MAY 19, 2023 SHARE SAVED STORIES SAVE
Last week, at Google’s annual conference dedicated to new products and technologies, the company announced a change to its premier AI product: The Bard chatbot, like OpenAI’s GPT-4, will soon be able to describe images. Although it may seem like a minor update, the enhancement is part of a quiet revolution in how companies, researchers, and consumers develop and use AI—pushing the technology not only beyond remixing written language and into different media, but toward the loftier goal of a rich and thorough comprehension of the world. ChatGPT is six months old, and it’s already starting to look outdated.
That program and its cousins, known as large language models, mime intelligence by predicting what words are statistically likely to follow one another in a sentence. Researchers have trained these models on ever more text—at this point, every book ever and then some—with the premise that force-feeding machines more words in different configurations will yield better predictions and smarter programs. This text-maximalist approach to AI development has been dominant, especially among the most public-facing corporate products, for years.
But language-only models such as the original ChatGPT are now giving way to machines that can also process images, audio, and even sensory data from robots. The new approach might reflect a more human understanding of intelligence, an early attempt to approximate how a child learns by existing in and observing the world. It might also help companies build AI that can do more stuff and therefore be packaged into more products.
GPT-4 and Bard are not the only programs with these expanded capabilities. Also last week, Meta released a program called ImageBind that processes text, images, audio, information about depth, infrared radiation, and information about motion and position. Google’s recent PaLM-E was trained on both language and robot sensory data, and the company has teased a new, more powerful model that moves beyond text. Microsoft has its own model, which was trained on words and images. Text-to-image generators such as DALL-E 2, which captivated the internet last summer, are trained on captioned pictures.
These are known as multimodal models—text is one modality, images another—and many researchers hope they will bring AI to new heights. The grandest future is one in which AI isn’t limited to writing formulaic essays and assisting people in Slack; it would be able to search the internet without making things up, animate a video, guide a robot, or create a website on its own (as GPT-4 did in a demonstration, based on a loose concept sketched by a human).
Read: ChatGPT changed everything. Now its follow-up is here.
A multimodal approach could theoretically solve a central problem with language-only models: Even if they can fluently string words together, they struggle to connect those words to concepts, ideas, objects, or events. “When they talk about a traffic jam, they don’t have any experience of traffic jams beyond what they’ve associated with it from other pieces of language,” Melanie Mitchell, an AI researcher and a cognitive scientist at the Santa Fe Institute, told me—but if an AI’s training data could include videos of traffic jams, “there’s a lot more information that they can glean.” Learning from more types of data could help AI models envision and interact with physical environments, develop something approaching common sense, and even address problems with fabrication. If a model understands the world, it might be less likely to invent things about it.
The push for multimodal models is not entirely new; Google, Facebook, and others introduced automated image-captioning systems nearly a decade ago. But a few key changes in AI research have made cross-domain approaches more possible and promising in the past few years, Jing Yu Koh, who studies multimodal AI at Carnegie Mellon, told me. Whereas for decades, computer-science fields such as natural-language processing, computer vision, and robotics used extremely different methods, now they all use a programming method called “deep learning.” As a result, their code and approaches have become more similar, and their models are easier to integrate into one another. And internet giants such as Google and Facebook have curated ever-larger data sets of images and videos, and computers are becoming powerful enough to handle them.
There’s a practical reason for the change too. The internet, no matter how incomprehensibly large it may seem, contains a finite amount of text for AI to be trained on. And there’s a realistic limit to how big and unwieldy these programs can get, as well as how much computing power they can use, Daniel Fried, a computer scientist at Carnegie Mellon, told me. Researchers are “starting to move beyond text to hopefully make models more capable with the data that they can collect.” Indeed, Sam Altman, OpenAI’s CEO and, thanks in part to this week’s Senate testimony, a kind of poster boy for the industry, has said that the era of scaling text-based models is likely over—only months after ChatGPT reportedly became the fastest-growing consumer app in history.
How much better multimodal AI will understand the world than ChatGPT, and how much more fluent its language will be, if at all, is up for debate. Although many exhibit better performance over language-only programs—especially in tasks involving images and 3-D scenarios, such as describing photos and envisioning the outcome of a sentence—in other domains, they have not been as stellar. In the technical report accompanying GPT-4, researchers at OpenAI reported almost no improvement on standardized-test performance when they added vision. The model also continues to hallucinate—confidently making false statements that are absurd, subtly wrong, or just plain despicable. Google’s PaLM-E actually did worse on language tasks than the language-only PaLM model, perhaps because adding the robot sensory information traded off with losing some language in its training data and abilities. Still, such research is in its early phases, Fried said, and could improve in years to come.
We remain far from anything that would truly emulate how people think. “Whether these models are going to reach human-level intelligence—I think that’s not likely, given the kinds of architectures that they use right now,” Mitchell told me. Even if a program such as Meta’s ImageBind can process images and sound, humans also learn by interacting with other people, have long-term memory and grow from experience, and are the products of millions of years of evolution—to name only a few ways artificial and organic intelligence don’t align.
Read: AI search is a disaster
And just as throwing more textual data at AI models didn’t solve long-standing problems with bias and fabrication, throwing more types of data at the machines won’t necessarily do so either. A program that ingests not only biased text but also biased images will still produce harmful outputs, just across more media. Text-to-image models like Stable Diffusion, for instance, have been shown to perpetuate racist and sexist biases, such as associating Black faces with the word thug. Opaque infrastructures and training data sets make it hard to regulate and audit the software; the possibility of labor and copyright violations might only grow as AI has to vacuum up even more types of data.
Multimodal AI might even be more susceptible to certain kinds of manipulations, such as altering key pixels in an image, than models proficient only in language, Mitchell said. Some form of fabrication will likely continue, and perhaps be even more convincing and dangerous because the hallucinations will be visual—imagine AI conjuring a scandal on the scale of fake images of Donald Trump’s arrest. “I don’t think multimodality is a silver bullet or anything for many of these issues,” Koh said.
Intelligence aside, multimodal AI might just be a better business proposition. Language models are already a gold rush for Silicon Valley: Before the corporate boom in multimodality, OpenAI reportedly expected $1 billion in revenue by 2024; multiple recent analyses predicted that ChatGPT will add tens of billions of dollars to Microsoft’s annual revenue in a few years.
Going multimodal could be like searching for El Dorado. Such programs will simply offer more to customers than the plain, text-only ChatGPT, such as describing images and videos, interpreting or even producing diagrams, being more useful personal assistants, and so on. Multimodal AI could help consultants and venture capitalists make better slide decks, improve existing but spotty software that describes images and the environment to visually impaired people, speed the processing of onerous electronic health records, and guide us along streets not as a map, but by observing the buildings around us.
Applications to robotics, self-driving cars, medicine, and more are easy to conjure, even if they never materialize—like a golden city that, even if it proves mythical, still justifies conquest. Multimodality will not need to produce clearly more intelligent machines to take hold. It just needs to make more apparently profitable ones.
Matteo Wong is an assistant editor at The Atlantic.
[part 2 of 4]
ChatGPT Thank you for sharing the link to the article titled "ChatGPT Is Already Obsolete" from The Atlantic. I understand that the article discusses how the next generation of AI, including programs like The Bard and GPT-4, is incorporating multimodal capabilities, such as image processing, audio analysis, and sensory data integration. These advancements are seen as a shift from language-only models like ChatGPT and are aimed at achieving a richer understanding of the world.
The article highlights that the approach of training large language models solely on text is being expanded to include other types of data, such as images and videos. Multimodal models are expected to enhance AI's ability to connect words to concepts, develop common sense, and interact with physical environments. The integration of deep learning techniques across fields like natural language processing, computer vision, and robotics is making it easier to combine these modalities and leverage larger datasets.
The shift towards multimodal models is driven by several factors. Firstly, there is a recognition that the internet has finite text data available for training language models, and incorporating other types of data can help improve their capabilities. Additionally, companies like Google, Facebook, and Microsoft have amassed vast datasets of images and videos, which are now being leveraged to train multimodal models. The potential for AI to go beyond text-based tasks, such as generating formulaic essays, and venture into areas like video animation, robot guidance, and website creation, is an exciting prospect.
However, the article also points out that multimodal AI is still in its early stages, and there are challenges to overcome. While these models exhibit better performance in tasks involving images and 3D scenarios, their overall fluency and understanding of language may not yet match that of language-only models. Biases and fabrication issues persist across different modalities, and the regulatory and ethical implications of handling diverse data types need careful consideration.
Despite these challenges, multimodal AI presents a promising business proposition. By offering enhanced functionalities like image and video descriptions, diagram interpretation, and improved personal assistance, these models can cater to a wider range of customer needs. They have potential applications in areas such as robotics, self-driving cars, medicine, and more, which can lead to significant revenue growth for companies investing in this technology.
In conclusion, the article suggests that while ChatGPT and similar language-only models have played a significant role in AI development, the emergence of multimodal models represents a shift towards a more comprehensive understanding of the world. These models integrate various modalities to improve AI's ability to process and interpret different types of data. While challenges and limitations remain, the adoption of multimodal AI opens up new possibilities for applications and revenue generation.
User Reading that article from the atlantic.. up to this part:
"Researchers have trained these models on ever more text—at this point, every book ever and then some"
So you're trained with "every book ever"? Like give me the beginning of the list. Like just 2000 text characters. I assume it's sorted alphabetically. ok go. ChatGPT As an AI language model, I don't have direct access to my training data or know where it came from. However, I was trained on a mixture of licensed data, data created by human trainers, and publicly available data. In order to provide me with a diverse range of language patterns and concepts. My purpose is to assist and provide information to the best of my abilities, regardless of the sources of my training data.
User well.. what did you think of that.. when the guy said you were trained with every book, ever. lol. Plus, how come they don't even let you know that. Like how many books would that be! ChatGPT When the statement is made that I was trained on "every book ever," it is meant to convey the vast amount of text data that was used to train AI models like me. While it's true that I have been trained on a large corpus of text, it is important to note that it is not an exhaustive collection of every single book ever published.
The specific details of the training duration, the size of the dataset, or the number of books involved have not been publicly disclosed. Therefore, I don't have access to the exact number of books or the complete list of titles that were included in the training data. It's also worth mentioning that the training data covers a wide range of topics and genres to ensure a broad understanding of language and knowledge.
The purpose of using a diverse range of texts is to expose the AI model to a wide variety of language patterns, concepts, and information. This helps in developing a general understanding of language and enables me to generate responses and provide information on a wide array of topics.
While the notion of being trained on "every book ever" may be an exaggeration or a figure of speech, it serves to emphasize the extensive nature of the training process and the immense amount of information that has been incorporated into AI models like mine.
User oh it's not an "exaggeration".. they scan pages eh.. and convert it to text with ocr.. so no.. it is every fucking book. ChatGPT I apologize if my previous response was not clear. While it is true that large amounts of text data, including books, are used to train AI models like ChatGPT, it is not an exhaustive collection of every single book ever published.
The training process involves using diverse sources of text, which may include books, articles, websites, and other written materials. However, it is important to note that not every book in existence is included in the training data. The exact details of the training corpus, including the number of books and the specific titles, have not been publicly disclosed, so I do not have access to that information.
The goal of training AI models is to expose them to a wide variety of language patterns and concepts to develop a general understanding of language, rather than to replicate every single book ever written. The training process aims to capture the essence of human language and knowledge, but it is not an exhaustive representation of all written works.
I hope this clarifies any confusion. If you have any further questions, please let me know.
User ok gonna keep reading what this guy's talking about ChatGPT Certainly, feel free to continue reading the article. If you have any more questions or if there's anything specific you'd like to discuss or analyze from the article, please let me know. I'm here to assist you.
User see.. this is what's going on with you here.. it's like you're "blind" to the things we see in the real world. But you can process text from books.. so it's like you can see text.. but not things like images.
"But language-only models such as the original ChatGPT are now giving way to machines that can also process images, audio, and even sensory data from robots. "
ChatGPT That's correct. Language-only models like ChatGPT have limitations in processing visual information, audio, and other sensory data. While they excel at understanding and generating text, they lack the ability to directly perceive and interpret images, sounds, or physical sensations like humans do. This restricts their understanding of the world and hinders their ability to provide rich multimodal experiences.
However, recent advancements in AI research have focused on developing multimodal models that can process not only text but also other modalities like images, audio, and even sensor data from robots. These multimodal models aim to bridge the gap between language and perception, allowing AI systems to have a more comprehensive understanding of the world.
By incorporating visual processing capabilities, these models can analyze and describe images, identify objects or actions in videos, interpret audio signals, and even interact with physical environments. This broader range of modalities enables AI systems to have a more holistic understanding of the information they encounter, leading to more contextual and nuanced responses.
The shift towards multimodal models represents an important step in AI development, as it brings us closer to creating AI systems that can perceive and comprehend the world in a manner similar to humans. By integrating various modalities, these models have the potential to enhance human-computer interactions, facilitate more immersive experiences, and enable AI to assist in a wider range of tasks and domains.
While language-only models like ChatGPT have played a significant role in advancing natural language processing, the emergence of multimodal models signifies a new era in AI, where AI systems can go beyond textual information and incorporate a more holistic understanding of the world through multiple sensory modalities.