You're pledging to donate if the project hits its minimum goal and gets approved. If not, your funds will be returned.
Leading research that aims to bring quantum theory into the AI field unlocking new horizons. By doing that, we will eradicate intractability that opposes major breakthroughs and boost creativity redeeming innovation at scale. Our motivations are fairly ambitious as we trying to push the boundaries beyond the early adaption phase and democratize AI, making it merely accessible to everyone around the globe. We are looking to involve the community as well, so everyone takes part in this revolutionary work.
The project's goals are twofold:
1-Simulating quantum theory, particularly superposition and entanglement, compressing the data into "state vectors".
2-As result we will look to build a multi-disciplinary digital entity with a superficial cognitive system at scale.
1)
First, how are we going to apply quantum physics theories into AI?
-We will draw inspiration from two fundamental quantum concepts: Entanglement and superposition, refining them to fit our purpose and goals to eventually spark a major breakthrough.
Applying Entanglement will help us predict tokens far apart in our dataset simultaneously, saving us endless time and reducing the computational footprint significantly. In this context, Entanglement is dependent on Superposition. Meaning, it cannot be achieved in training models unless the data trained upon is superposed. What do I mean by that?
Superposition consists of the correlation of tokens. "Main Words" should be represented by a dense, multidimensional vector that encapsulates the entire lexicon related to them. For example, if the machine encounters the word "vehicle" in a sequence, it should be able to simultaneously generate related words such as car, bus, Ford, Tesla, etc. regardless how far apart they're from the "main word" in the dataset. Similarly, this should happen with each word under the same lexical field. The data should come pre-embedded in that sense, allowing the model to run an initial scan, and generate all related words. This is not prediction; the machine will only generate roughly all words in the dataset in a random approach. Then, based on the semantics learned and the correlation between words, it should predict "context" meaning full phrases and sentences instead of sole tokens. This might require human intervention and supervised learning initially. However, I'm pretty confident that sooner than later, machines will become confident predicting contexts on their own after grasping multiple samples previously.
2)
We looking to build a multi-disciplinary entity that is rational, dynamic, and fun so you can interface with it and have real time speech-to-speech discussions. It can serve as an assistant, mentor, or simply a best friend with whom you argue a lot but still cherish.
To achieve this, we will need to go through two crucial steps inspired by the "DIFFUSION MODELS ARE REAL-TIME GAME ENGINES" paper.
1: Train the agent in an interactive environment with given parameters. In our case, the parameters will be the following: the conversation's dynamic memory content, the rendered screen pixels, the agent's rendering logic, and the agent's logic given the user's input. Our environment here is a speech-to-speech interface where different users will be holding a variety of discussions with the agent. This is when the community is going to be much needed playing a crucial role into developing a diverse dataset from scratch.
2: The entity has a body. Hence, we will need to produce numerous animations, body movements, gestures and facial expressions. Then, we will stitch them to their equivalent speech data and create speech-image pairs.
3: The community interactions with the agent in our exclusive environment alongside the animations will be collected as a data upon which a generative model will be trained. This generative model will likely be text-to-image diffusion model that we will re-purpose and condition on both speech and emotions. We will implement a sophisticated dual-channel input system, combining speech and emotions embeddings using cross-attention.
-A portion of the funds will be primarily leveraged to fund a full-time team and acquire a decent office to ensure consistent operation.
-Get the necessary computational resources to empower our experiments and run trials at scale.
-Collaborate with leading labs to bring in their expertise to the table, making sure we stay on the right course.
TO BE ANNOUNCED!
Currently in talks with several people. I'm trying to get the best to join, so it might take a little bit of time