llm
LLM - Large Language Model.
News: In this version I continue to show new results for extended use of the current LLM DataSet.
And I don't yet have the completed syntax or semantics for the programming language that should be used to obtain the result shown in the point 20.
In case someone has got an advice or wishes in regards to the point 20 you are welcome. :-)
For whom.
For some reason you don't want to work with open or commercial LLMs or you don't want to understand the knowledge in areas that tell you how to build a LLM by yourself, it is possible you just wonna to build your own data to work with LLM. Someone of the brightest said that all models are wrong, but some of can be used. Then this corner of the Internet is for you. It's not that complicated. But first, let's look at a ready-made LLM, it is intentionally was made very very simplified. And it's all free of charge.
To start
-
Copy llm.zip to the folder you selected on your computer.
-
Unzip archive.
-
go into llm folder.
-
Launch console.
Here I have to say a little bit more. I have used 2 type of consoles: bash and Windows PowerShell. The odd problem I have met is that Up Arrow button somtimes does not work with bash. But with Windows PowerShell it works all the time. The Up Arrow button I use to quickly find already typed sequences of words. So far it does not disturb me much.
- LLM training.
We perform LLM training on ready-made data, this is a very important part of your work. The data set for training is located in the textbook folder in the form of text files that simulate arithmetic operations from 0 + 0 = 0 to 9 + 9 = 18 in the format of numbers and words. The format of the records can be viewed by opening any file from the textbook folder in your editor. For llm training, go into the llm folder in case you are not there yet, launch the console in the llm folder and type in the console: llm.exe -td 1 or ./llm.exe -td 1. The data.txt file should appear in the brains folder, which contains the data on the basis of which llm will communicate with you.
A bit of explanation. The -td 1 parameter is an abbreviation for ToDo, and 1 indicates that llm should take files from textbook folder and create data for itself in the brains folder.
- Work with LLM.
While in the same folder, type: llm.exe -td 2, which means for llm that it needs to enter to the conversation mode with the user.
llm gives out an invitation to start communication, below is the sample of the dialogue of your possible communication with llm through the console:
user:->1+1=<press enter>
llm: typed 1+1
llm: result: =2
user:2+2=<press enter>
llm:result:4
user: 2+u=<press enter>
llm:error:word u could not be evaluated
Explanations. Yes, you (under my supervision) made a mistake - it happens. Now we are going to correct it.
user:2+8=<press enter>
llm:result:10
Now you can test llms work with words, not just numbers.
user: nil plus nil equals <press enter>
llm: result: nil
user: eight plus eight equals <press enter>
llm: result:sixteen
to leave the program type exit end press <enter>
- Setting the llm operation using the config.txt file
File config.txt is located in relative config subfolder. You have 4 parameters available:
evaluator, chatter, store and capacity.
evaluator has two options:
⦁ simple - returns the next available word in order.
⦁ random - returns a word chosen randomly from possible ones.
chatter has two options:
⦁ completer - llm completes the sentence you typed.
⦁ teller - "tries" to maintain a "conversation" with you by choosing the last word of the previous sentence to begin a new one.
store - shows the type of the store llm should use.
⦁ ram, process data model located in memory.
⦁ file, keep all the data in a file system, and take in to the ram only part of the data necessary to compute result.
But, in case you perhaps use store parameter with option file llm prints the list of incoming files it processes one by one and finally prints the message that process for data for brains creation is over. It's a little bit more fun than just waiting for LLM to finish its work.
A little bit more, before to launch llm in mode to create dataset I would advice to delete appropriate files from brains folder:
⦁ ram option, you should delete data.txt
⦁ file option, you should delete set of files according to the pattern: data_[index].txt
I will fix it in a nearest version.
capacity
shows the value for the each file where llm will save it's data. The default value is 16384, but in reality it will take 3 times more. That is my fault, next version I'll fix it. Sorry.
-
Now you may communicate with LLM via the http port. The sample of the client was added to the zip file. It is possible to see it in the client subfolder.
-
The client use just 3 parameters, the new one is the text the server should process. The name of the parameter is parse. The rest of them are two: evaluator and chatter, read point 7 for explanations. And there is a sample of code in the client folder.
-
The response has 3 parameters to be processed by the client: error, result and last. By use them you able to build your communication with LLM via the http port.
-
When llm is running in http server mode, you need to tell to the llm what type of evaluator and chatter llm should use for each request. In that case LLM does not pay attention to the content of the config.txt file for those parameters.
-
To run llm in http server mode, you need to enter: ./llm.exe -td 3 in the console running in the llm subfolder. There is a keyword in the configuration file that specifies a port number that you may use, the default is 8080. The client should use the same port number of course.
-
Cache option in config.txt file, shows the numbers of files the LLM able to keep in RAM to speed up work. Default value is 23. You may take number at your convenience.
-
Add new sequences. There is a new subfolder named add, where is located files discribe sequences it is necessary to add.To start adding new sequences, place them in the add subfolder and type ./llm.exe -td 5 in the console. The soft will begin the process of adding new sequences to the LLM data set. During the process, the soft will show which sequences will be added. An example file is located in the add subfolder.
-
Delete the old sequences. There is a new subfolder named delete, where is located files discribe sequences it is necessary to remove. To start the removing of the old sequences, place them in the delete subfolder and type ./llm.exe -td 4 in the console. The soft will begin the process of removing the old sequences from the LLM data set. During the process, the soft will show which sequences will be removed. An example file is located in the delete subfolder.
-
I have slightly expanded the set of sequences for the examples of deletion and addition. You can see them in the corresponding subfolders. add and delete.
-
Defragmentation. Since the current version has been added the parameter defragment to the config.txt file and it's processing.
It has two options amount and execute. Respectively by use the first one it is possible to know amount of the logical items possible to delete, and by use the second to give the order to the llm to free not used disk space.
- multithreading. During llm dataset creation it is now possible to process input files both sequentially and in parallel. parameter has two options true and false.
Explanation.The source data is grouped into files according to similarity criteria, see folder textbook. Parameter multithreading.file.amount indicates the number of files to be processed simultaneously. LLM reads one record from each file and processes the entire set of read records in parallel. Then LLM reads the next set of records. Then the program selects the next set of unprocessed files if there are any in the current folder.
-
To get a brief help how to use soft you should type in console mode: ./llm.exe -h
-
Explanation. Option extended no more exist for evaluation parameter within config.txt file. Now soft by itself know that it is nesessary to use extended mode.
Sample: Data set has sequences: 1+1=2,3+5=8, 9+9=18 and so on. For the description in full see files in folder textbook.
And now when you type the sequence: 11 + oneone plus oneone +1 the system givesout result 34,
-
Well, now you know everything to create data for setting up your own llm. It's up to you. Experiment as you wish.
-
Memo. The program is written in Go for Windows 10, and now there is a multithreading in case of the data creation mode. And there is a plan for it's further development.
-
If you find any error and wish to send me a description, please provide a set of data where it is possible to reproduce the bug. That would be very kind.
-
In case there is a modification you need or have any curiosity drop me a message. mail:gussev@hotmail.com
-
So, English is not my first language. Sorry for inconvenience.