Also, there is some anaogy here that is trying to break into my brain between the trans/immanence contradistinction or relationship, and the structure/agency confabulations we dance within with duos like credit/blame, but it does not quite arrive in mind, transcendental or not.... inchoate it arises felt but not grasped, feared but not got
okay, …we are in the middle, dancing on a threshold like Janus,
① before us is the immanent to which we are a 'transcendent field' once we are aware of 'things' enough too think like animals with subject/object tools/classes/words/grammars/recursions.
② ahead of us lies the transcendent, which is a hypothetical heaven, which is proabably more just like we are, except it hasn't happened yet... of course once it does, it is a hindsight, but creating the transcendental in the image of hindsight, this is declasse, even when it isn't idolatry.
③ thus the 'transcendantal field' is empirically available to us (its what we do, where we are) but the transcendent of this experience is not available to us and we must suspend judgement, I would argue, or turn this idol into the enemy like the enemy wants.
④ my sense of th eimmanent goes back to at least the precursors of life, I suspect Deleuze puts it closer to (self)consciouness if not at little self-consciously.
And thus my voice get digital laryngitis , my experience is normalised, homogenised, will all trace disapper? https://arxiv.org/abs/2401.16380
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly
Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance of both compute and data, which grows with the size of the model being trained. This is infeasible both because of the large compute costs and duration associated with pre-training, and the impending scarcity of high-quality data on the web. In this work, we propose Web Rephrase Augmented Pre-training (WRAP) that uses an off-the-shelf instruction-tuned model prompted to paraphrase documents on the web in specific styles such as "like Wikipedia" or in "question-answer format" to jointly pre-train LLMs on real and synthetic rephrases. First, we show that using WRAP on the C4 dataset, which is naturally noisy, speeds up pre-training by ∼3x. At the same pre-training compute budget, it improves perplexity by more than 10% on average across different subsets of the Pile, and improves zero-shot question answer accuracy across 13 tasks by more than 2%. Second, we investigate the impact of the re-phrasing style on the performance of the model, offering insights into how the composition of the training data can impact the performance of LLMs in OOD settings. Our gains are attributed to the fact that re-phrased synthetic data has higher utility than just real data because it (i) incorporates style diversity that closely reflects downstream evaluation style, and (ii) has higher 'quality' than web-scraped data.
Also, there is some anaogy here that is trying to break into my brain between the trans/immanence contradistinction or relationship, and the structure/agency confabulations we dance within with duos like credit/blame, but it does not quite arrive in mind, transcendental or not.... inchoate it arises felt but not grasped, feared but not got
okay, …we are in the middle, dancing on a threshold like Janus,
① before us is the immanent to which we are a 'transcendent field' once we are aware of 'things' enough too think like animals with subject/object tools/classes/words/grammars/recursions.
② ahead of us lies the transcendent, which is a hypothetical heaven, which is proabably more just like we are, except it hasn't happened yet... of course once it does, it is a hindsight, but creating the transcendental in the image of hindsight, this is declasse, even when it isn't idolatry.
③ thus the 'transcendantal field' is empirically available to us (its what we do, where we are) but the transcendent of this experience is not available to us and we must suspend judgement, I would argue, or turn this idol into the enemy like the enemy wants.
④ my sense of th eimmanent goes back to at least the precursors of life, I suspect Deleuze puts it closer to (self)consciouness if not at little self-consciously.
And thus my voice get digital laryngitis , my experience is normalised, homogenised, will all trace disapper? https://arxiv.org/abs/2401.16380
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Pratyush Maini, Skyler Seto, He Bai, David Grangier, Yizhe Zhang, Navdeep Jaitly
Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance of both compute and data, which grows with the size of the model being trained. This is infeasible both because of the large compute costs and duration associated with pre-training, and the impending scarcity of high-quality data on the web. In this work, we propose Web Rephrase Augmented Pre-training (WRAP) that uses an off-the-shelf instruction-tuned model prompted to paraphrase documents on the web in specific styles such as "like Wikipedia" or in "question-answer format" to jointly pre-train LLMs on real and synthetic rephrases. First, we show that using WRAP on the C4 dataset, which is naturally noisy, speeds up pre-training by ∼3x. At the same pre-training compute budget, it improves perplexity by more than 10% on average across different subsets of the Pile, and improves zero-shot question answer accuracy across 13 tasks by more than 2%. Second, we investigate the impact of the re-phrasing style on the performance of the model, offering insights into how the composition of the training data can impact the performance of LLMs in OOD settings. Our gains are attributed to the fact that re-phrased synthetic data has higher utility than just real data because it (i) incorporates style diversity that closely reflects downstream evaluation style, and (ii) has higher 'quality' than web-scraped data.