
Understanding the facility of Lifelong Learning through the Efficient Lifelong Learning Algorithm (ELLA) and VOYAGER

I encourage you to read Part 1: The Origins of LLML in case you haven’t already, where we saw using LLML in reinforcement learning. Now that we’ve covered where LLML got here from, we are able to apply it to other areas, specifically supervised multi-task learning, to see a few of LLML’s true power.
Supervised LLML: The Efficient Lifelong Learning Algorithm
The Efficient Lifelong Learning Algorithm goals to coach a model that may excel at multiple tasks without delay. ELLA operates within the multi-task supervised learning setting, with multiple tasks T_1..T_n, with features X_1..X_n and y_1…y_n corresponding to every task(the size of which likely vary between tasks). Our goal is to learn functions f_1,.., f_n where f_1: X_1 -> y_1. Essentially, each task has a function that takes as input the duty’s corresponding features and outputs its y values.
On a high level, ELLA maintains a shared basis of ‘knowledge’ vectors for all tasks, and as latest tasks are encountered, ELLA uses knowledge from the idea refined with the info from the brand new task. Furthermore, in learning this latest task, more information is added to the idea, improving learning for all future tasks!
Ruvolo and Eaton used ELLA in three settings: landmine detection, facial features recognition, and exam rating predictions! As a little bit taste to get you enthusiastic about ELLA’s power, it achieved as much as a 1,000x more time-efficient algorithm on these datasets, sacrificing next to no performance capabilities!
Now, let’s dive into the technical details of ELLA! The primary query that may arise when attempting to derive such an algorithm is
How exactly can we find what information in our knowledge base is relevant to every task?
ELLA does so by modifying our f functions for every t. As an alternative of being a function f(x) = y, we now have f(x, θ_t) = y where θ_t is exclusive to task t, and may be represented by a linear combination of the knowledge base vectors. With this technique, we now have all tasks mapped out within the same basis dimension, and may measure similarity using easy linear distance!
Now, how can we derive θ_t for every task?
This query is the core insight of the ELLA algorithm, so let’s take an in depth have a look at it. We represent knowledge basis vectors as matrix L. Given weight vectors s_t, we represent each θ_t as Ls_t, the linear combination of basis vectors.
Our goal is to attenuate the loss for every task while maximizing the shared information used between tasks. We achieve this with the target function e_T we try to attenuate:
Where ℓ is our chosen loss function.
Essentially, the primary clause accounts for our task-specific loss, the second tries to attenuate our weight vectors and make them sparse, and our last clause tries to attenuate our basis vectors.
**This equation carries two inefficiencies (see in case you can determine what)! Our first is that our equation depends upon all previous training data, (specifically the inner sum), which we are able to imagine is incredibly cumbersome. We alleviate this primary inefficiency using a Taylor sum of approximation of the equation. Our second inefficiency is that we’d like to recompute every s_t to judge one instance of L. We eliminate this inefficiency by removing our minimization over z and as a substitute computing s when t is last interacted with. I encourage you to read the unique paper for a more detailed explanation!**
Now that we’ve got our objective function, we wish to create a technique to optimize it!
In training, we’re going to treat each iteration as a unit where we receive a batch of coaching data from a single task, then compute s_t, and at last update L. At first of our algorithm, we set T (our number-of-tasks counter), A, b, and L to zeros. Now, for every batch of information, we case based on the info is from a seen or unseen task.
If we encounter data from a brand new task, we’ll add 1 to T, and initialize X_t and y_t for this latest task, setting them equal to our current batch of X and y..
If we encounter data we’ve already seen, our process gets more complex. We again add our latest X and y so as to add our latest X and y to our current memory of X_t and y_t (by running through all data, we can have a whole set of X and y for every task!). We also incrementally update our A and b values negatively (I’ll explain this later, just remember this for now!).
Now we check if we wish to finish our training loop. We set our (θ_t, D_t) equal to the output of our regular learner for our batch data.
We then check to finish the loop (if we’ve got seen all training data). If we haven’t ended, we move on to computing s and updating L.
To compute s, we first compute optimal model theta_t using only the batched data, which can rely on our specific task and loss function.
We then compute D_t, and either randomly or to one in every of the θ_ts initialize any all-zero columns of L (which occurs if a certain basis vector is unused). In linear regression,
and in logistic regression
Then, we compute s_t using L by solving an L1-regularized regression problem:
For our final step of updating L, we take
, find where the gradient is 0, then solve for L. By doing so, we increase the sparsity of L! We then output the updated columnwise-vectorization of L as
in order to not sum over all tasks to compute A and b, we construct them incrementally as each task arrives.
Once we’ve iterated through all batch data, we’ve learned all tasks properly and have finished!
The facility of ELLA lies in a lot of its efficiency optimizations, primarily of which is its approach to using θ functions to know exactly what basis knowledge is beneficial! For those who care a few more in-depth understanding of ELLA, I highly encourage you to examine out the pseudocode and explanation in the unique paper.
Using ELLA as a base, we are able to imagine making a generalizable AI, which might learn any task it’s presented with. We again have the property that the more our knowledge basis grows, the more ‘relevant information’ it accommodates, which can even further increase the speed of learning latest tasks! It seems as if ELLA might be the core of one in every of the super-intelligent artificial learners of the longer term!
Voyager
What happens after we integrate the latest leap in AI, LLMs, with Lifelong ML? We get something that may beat Minecraft (That is the setting of the particular paper)!
Guanzhi Wang, Yuqi Xie, and others saw the brand new opportunity offered by the facility of GPT-4, and decided to mix it with ideas from lifelong learning you’ve learned to this point to create Voyager.
In terms of learning games, typical algorithms are given predefined final goals and checkpoints for which they exist solely to pursue. In open-world games like Minecraft, nonetheless, there are numerous possible goals to pursue and an infinite amount of space to explore. What if our goal is to approximate human-like self-motivation combined with increased time efficiency in traditional Minecraft benchmarks, comparable to getting a diamond? Specifically, let’s say we wish our agent to have the ability to come to a decision on feasible, interesting tasks, learn and remember skills, and proceed to explore and seek latest goals in a ‘self-motivated’ way.
Towards these goals, Wang, Xie, and others created Voyager, which they called the primary LLM-powered embodied lifelong learning agent!
How does Voyager work?
On a large-scale, Voyager uses GPT-4 as its most important ‘intelligence function’ and the model itself may be separated into three parts:
- Automatic curriculum: This decides which goals to pursue, and may be regarded as the model’s “motivator”. Implemented with GPT-4, they instructed it to optimize for difficult yet feasible goals and to “discover as many diverse things as possible” (read the unique paper to see their exact prompts). If we pass 4 rounds of our iterative prompting mechanism loop without the agent’s environment changing, we simply select a brand new task!
- Skill library: a set of executable actions comparable to craftStoneSword() or getWool() which increase in difficulty because the learner explores. This skill library is represented as a vector database, where keys are embedding vectors of GPT-3.5-generated skill descriptions, and executable skills in code form. GPT-4 generated the code for the talents, optimized for generalizability and refined by feedback from using the skill within the agent’s environment!
- Iterative prompting mechanism: That is the element that interacts with the Minecraft environment. It first executes its’ interface of Minecraft to realize details about its current environment, for instance, the items in its inventory and the encircling creatures it may possibly observe. It then prompts GPT-4 and performs the actions laid out in the output, also offering feedback about whether the actions specified are unimaginable. This repeats until the present task (as decided by the automated curriculum) is accomplished. At completion, we add the learned skill to the skill library. For instance, if our task was create a stone sword, we now put the skill craftStoneSword() into our skill library. Finally, we ask the automated curriculum for a brand new goal.
Now, where does Lifelong Learning fit into all this?
Once we encounter a brand new task, we query our skill database to search out the highest 5 most relevant skills to the duty at hand (for instance, relevant skills for the duty getDiamonds() can be craftIronPickaxe() and findCave().
Thus, we’ve used previous tasks to learn our latest task more efficiently: the essence of lifelong learning! Through this method, Voyager constantly explores and grows, learning latest skills that increase its frontier of possibilities, increasing the dimensions of ambition of its goals, thus increasing the powers of its newly learned skills, constantly!
Compared with other models like AutoGPT, ReAct, and Reflexion, Voyager discovered 3.3x as many latest items as these others, navigated distances 2.3x longer, unlocked picket level 15.3x faster per prompt iteration, and was the just one to unlock the diamond level of the tech tree! Furthermore, after training, when dropped in a totally latest environment with no items, Voyager consistently solved prior-unseen tasks, while others couldn’t solve any inside 50 prompts.
As a display of the importance of Lifelong Learning, without the skill library, the model’s progress in learning latest tasks plateaued after 125 iterations, whereas with the skill library, it kept rising at the identical high rate!
Now imagine this agent applied to the actual world! Imagine a learner with infinite time and infinite motivation that would keep increasing its possibility frontier, learning faster and faster the more prior knowledge it has! I hope by now I’ve properly illustrated the facility of Lifelong Machine Learning and its capability to prompt the following transformation of AI!
For those who’re interested further in LLML, I encourage you to read Zhiyuan Chen and Bing Liu’s book which lays out the potential future paths LLML might take!
Thanks for making all of it the best way here! For those who’re interested, take a look at my website anandmaj.com which has my other writing, projects, and art, and follow me on Twitter @almondgod.
Original Papers and other Sources:
Eaton and Ruvolo: Efficient Lifelong Learning Algorithm
Wang, Xie, et al: Voyager
Chen and Liu, Lifelong Machine Learning (Inspired me to jot down this!): https://www.cs.uic.edu/~liub/lifelong-machine-learning-draft.pdf
Unsupervised LL with Curricula: https://par.nsf.gov/servlets/purl/10310051
Deep LL: https://towardsdatascience.com/deep-lifelong-learning-drawing-inspiration-from-the-human-brain-c4518a2f4fb9
Neuro-inspired AI: https://www.cell.com/neuron/pdf/S0896-6273(17)30509-3.pdf
Embodied LL: https://lis.csail.mit.edu/embodied-lifelong-learning-for-decision-making/
LL for sentiment classification: https://arxiv.org/abs/1801.02808
Lifelong Robot Learning: https://www.sciencedirect.com/science/article/abs/pii/092188909500004Y
Knowledge Basis Idea: https://arxiv.org/ftp/arxiv/papers/1206/1206.6417.pdf
Q-Learning: https://link.springer.com/article/10.1007/BF00992698
AGI LLLM LLMs: https://towardsdatascience.com/towards-agi-llms-and-foundational-models-roles-in-the-lifelong-learning-revolution-f8e56c17fa66
DEPS: https://arxiv.org/pdf/2302.01560.pdf
Voyager: https://arxiv.org/pdf/2305.16291.pdf
Meta-Learning: https://machine-learning-made-simple.medium.com/meta-learning-why-its-a-big-deal-it-s-future-for-foundation-models-and-how-to-improve-it-c70b8be2931b
Meta Reinforcement Learning Survey: https://arxiv.org/abs/2301.08028