jetro30087

jetro30087 t1_je8mtjp wrote

This is a updated dataset for the 7b model, but you could train the others with the data. From anecdotal reports, the dataset seems to have a great impact on the model's performance than the parameter size up to a point. Less parameters means a faster model. More parameters mean the model can make longer responses.

https://huggingface.co/8bit-coder/alpaca-7b-nativeEnhanced

2

jetro30087 t1_je8l2a4 wrote

Close, the data is structured to provide a relationship between prompts and the appropriate responses. LLaMa mostly uses publicly curated datasets by AI researchers that focus on a variety of topics. Here's an excerpt from a finetuned Alpaca models training data:

[
{
    "instruction": "Give three tips for staying healthy.",
    "input": "",
    "output": "1. Eat a balanced diet and make sure to include plenty of fruits and vegetables. \n2. Exercise regularly to keep your body active and strong. \n3. Get enough sleep and maintain a consistent sleep schedule."
},
{
    "instruction": "What are the three primary colors?",
    "input": "",
    "output": "The three primary colors are red, blue, and yellow."
},
{
    "instruction": "Describe the structure of an atom.",
    "input": "",
    "output": "An atom is made up of a nucleus, which contains protons and neutrons, surrounded by electrons that travel in orbits around the nucleus. The protons and neutrons have a positive charge, while the electrons have a negative charge, resulting in an overall neutral atom. The number of each particle determines the atomic number and the type of atom."
},
{
    "instruction": "How can we reduce air pollution?",
    "input": "",
    "output": "There are a number of ways to reduce air pollution, such as shifting to renewable energy sources, encouraging the use of public transportation, prohibiting the burning of fossil fuels, implementing policies to reduce emissions from industrial sources, and implementing vehicle emissions standards. Additionally, individuals can do their part to reduce air pollution by reducing car use, avoiding burning materials such as wood, and changing to energy efficient appliances."
},

There are 50K+ examples in the file that the model can use to build relationships between questions and appropriate responses during the training process. If enough relationships are built, the model should be able to infer the correct answer to unique questions.

3

jetro30087 t1_jdu9slz wrote

How's that different from any Star Trek episode where a crew member goes to the holodeck and instructs the Enterprise's computer to build a program?

It's not inventing a program, it's completing a command using the information stored in its programming, according to the rules set by its programming. It codes because its trained-on terabytes of code that perform task. When you ask for code that does that task it's just retrieving that information and altering it somewhat based on the rules that dictate its response. Unlike humans however, it's not compelled to design a program that does anything without being prompted.

2

jetro30087 t1_iudriy9 wrote

Capitalism will solve it once the price of water increases enough to justify building desalinization plants to make more money and not a second sooner. Look forward to trading fresh water on the futures market.

15