ABOUT OPENAI CHATGPT
OpenAI ChatGPT (Chat Generative Pre-trained Transformer) is a chatbot developed by OpenAI and launched in November 2022. It is built on top of OpenAI’s GPT-3 family of large language models and has been fine-tuned (an approach to transfer learning) using both supervised and reinforcement learning techniques.
ChatGPT was launched as a prototype on November 30, 2022, and quickly garnered attention for its detailed responses and articulate answers across many domains of knowledge. Its uneven factual accuracy, however, was identified as a significant drawback. Following the release of ChatGPT, OpenAI’s valuation was estimated at US$29 billion.
Although the core function of a chatbot is to mimic a human conversationalist, ChatGPT is versatile. For example, it can write and debug computer programs, compose music, teleplays, fairy tales, and student essays; answer test questions (sometimes, depending on the test, at a level above the average human test-taker); write poetry and song lyrics; emulate a Linux system; simulate an entire chat room; play games like tic-tac-toe; and simulate an ATM. ChatGPT’s training data includes man pages and information about Internet phenomena and programming languages, such as bulletin board systems and the Python programming language.
ChatGPT – a generative pre-trained transformer (GPT) – was fine-tuned (an approach to transfer learning) on top of GPT-3.5 using supervised learning as well as reinforcement learning. Both approaches used human trainers to improve the model’s performance. In the case of supervised learning, the model was provided with conversations in which the trainers played both sides: the user and the AI assistant. In the reinforcement step, human trainers first ranked responses that the model had created in a previous conversation. These rankings were used to create ‘reward models’ that the model was further fine-tuned on using several iterations of Proximal Policy Optimization (PPO). Proximal Policy Optimization algorithms present a cost-effective benefit to trust region policy optimization algorithms; they negate many of the computationally expensive operations with faster performance. The models were trained in collaboration with Microsoft on their Azure supercomputing infrastructure.
In addition, OpenAI continues to gather data from ChatGPT users that could be used to further train and fine-tune ChatGPT. Users are allowed to upvote or downvote the responses they receive from ChatGPT; upon upvoting or downvoting, they can also fill out a text field with additional feedback. Source : Wikipedia