TiredOldCrow
TiredOldCrow t1_iz865hz wrote
Mechanical Turk actually allows this. There's a special "Adult Content" Qualification.
TiredOldCrow t1_ivpr1wm wrote
Reply to [D] Video: The New AI Model Licenses have a Legal Loophole (OpenRAIL-M of BLOOM, Stable Diffusion, etc.) by ykilcher
For reference, here's a link to the BigScience RAIL License.
The license includes usage restrictions which specifically forbid illegal and harmful uses (many of which are ostensibly already illegal under the law), as well as specific other uses that the model authors are uneasy about (e.g., medical advice, law enforcement, and automated decision making).
Researchers are effectively attempting to morally (and perhaps legally) exonerate themselves from abusive usage of technology they have developed. With this, they also hope to retain the right to exceptionally approve or deny usage in sensitive areas.
Personally, I'm sympathetic to the dilemma faced by researchers, given the large potential for abuse of these models, and the relative lack of regulation of AI systems in some jurisdictions. That said, I believe that hard legislation is ideally where usage restrictions would be enforced, not software licensing. Existing laws and policies, such as Article 22 of the GDPR, or Canada's TBS Directive on Automated Decision Making, should serve as a template.
TiredOldCrow t1_iv8tqar wrote
Reply to [R] Reincarnating Reinforcement Learning (NeurIPS 2022) - Google Brain by smallest_meta_review
I know it's naive to expect machine learning to imitate life too closely, but for animals, "models" that are successful enough to produce offspring pass on elements of those "weights" to their children through nature+nurture.
The idea of weighting more successful previous models more heavily when "reincarnating" future models, and potentially borrowing some concepts from genetic algorithms with respect to combining multiple successful models seems interesting to me.
TiredOldCrow t1_iv0s1bg wrote
Reply to comment by dojoteef in [N] Class-action lawsuit filed against GitHub, Microsoft, and OpenAI regarding the legality of GitHub Copilot, an AI-using tool for programmers by Wiskkey
Great read, thanks for that. Updated the comment.
TiredOldCrow t1_iuzp1y3 wrote
Reply to [N] Class-action lawsuit filed against GitHub, Microsoft, and OpenAI regarding the legality of GitHub Copilot, an AI-using tool for programmers by Wiskkey
I appreciate that the legendary "fast inverse square root" code from Quake 3 gets produced verbatim, comments and all, if you start with "float Q_rsqrt".
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;
x2 = number * 0.5F;
y = number;
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}
I'm interested in how practical it will be for a motivated attacker to poison a code generation models with vulnerable code. Also curious to what extent these models produce code that only works with outdated and vulnerable dependencies -- a problem you'll also run into if you naively copy old StackOverflow posts. I've recently been working on threat models in natural language generation, but it seems like threat models in code generation are also going to be interesting.
Edit: Not John Carmack!
TiredOldCrow t1_its89ot wrote
Since you're using different pre-trained VGG16 models as a starting point, you may just be demonstrating that the PyTorch torchvision model is more amenable to your combination of hyperparameters than the TensorFlow one.
Ideally for this kind of comparison you'd use the exact same pretrained model architecture+weights as a starting point. Maybe look for a set of weights that has been ported to both PyTorch and TensorFlow?
Submitted by TiredOldCrow t3_ydgwnq in MachineLearning
TiredOldCrow t1_it7kkg2 wrote
Sympathetic to the complaint, but don't know if you've linked a good example here.
I'll go to bat for Jason Brownlee. He's provided some really excellent hands-on tutorials over the years, repeatedly updates his blogs based on feedback, and overall has made the field much more accessible.
TiredOldCrow OP t1_isvei0e wrote
My own thinking is that large venues might implement automated filtering using detection models. Detection of machine generated text increases with sequence length, so papers with large amounts of generated text stand reasonable odds of detection.
That said, the results of this detection would likely need to be scrutinized by a reviewer anyways (especially if conferences don’t ban AI writing assistants altogether).
Submitted by TiredOldCrow t3_y7mwmw in MachineLearning
TiredOldCrow t1_j4wdufa wrote
Reply to [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
Nothing works consistently, especially if an attacker tests their own outputs against the open source detectors, or makes manual tweaks to the outputs.
Survey paper