Where is AI, where am I?
This story is to catch-up with AI, even if moves away while you are reading.
You see a news, announcement from AI almost everyday, which has been triggered by advancements in ChatGPT and the enabling of training with large datasets which results with better AI suggestions.
Google’s Transformer architecture has been a breakthrough allowing a neural network to learn context and meaning by tracking relationships in sequential data (like words in same sentence) and learning in parallel which has accelerated training speed. It also stands as last letter of ChatGPT which is based on the Transformer architecture.
Latest applications of AI
Latest advancements have opened big doors for various AI applications for productivity you can see below.
ChatGPT
Nowadays it is the 1st application coming to minds when we say AI.
So let’s look to most important part about ChatGPT: how it works?
There are 3 stages in training of ChatGPT (Chat Generative Pre-trained Transformer):
Stage 1: Generative Pre-Training.
The transformer is trained on lots of text data from all over the internet — websites, books, articles and etc. So many functionalities including language modeling, summarization, translation, and sentiment analysis are prepared in this intensive stage.
Stage 2: Supervised Fine-Tuning (SFT)
The model gets trained on specific tasks that are relevant to what the user is looking for, including conversational chat. The main aim is to match with the users’ expectations.
- The 1st step in this stage is to form sets of crafted conversations. These conversations are created by one human agent chatting with another human agent pretending to be a chatbot with “ideal ”responses.
- Then the conversation history is used as input and ideal next response as output as training corpus. A set of tokens are generated to update Base GPT model’s parameters.
- The training corpus is trained using Base GPT Model and Stochastic Gradient Descent Algorithm (SGD). It consists of rounds after each a feedback is given to model to correct the faults. This process repeats until the computer gets really good at conversation. SGD is like the optimization algorithm that keeps pinching the parameters of a model until the cost function is minimized.
During this stage, the parameters of the ChatGPT base model are updated to capture task-specific info that wasn’t around before SFT.
Stage 3: Reinforcement Learning through Human Feedback (RLHF)
In this stage, the agent interacts with its environment and learns to make decisions by getting rewarded or punished. A human agent gives responses in this stage and another human ranks them. The model pairs request and responses to know which is better and reward model gives a high score to ChatGPT when it responds with best option.
But there is a trap in this stage known as Goodhart’s Law:
“when a measure becomes a target, it ceases to be a good measure.”
To deal with this issue, an extra step: measure of difference between two probability distributions has been added to the stage, called Kullback-Leibler (KL) divergence. It tells how much info gets lost when one thing is used to guess the other, which is commonly used for Machine Learning tasks. The model understands is in trouble when the KL divergence is too high. The trap has been fixed in ChatGPT via this step.
Finally, ChatGPT is made ready to answer questions of you after completion of this stage. It only knows what it has been taught — so if you ask it something that it hasn’t been trained, will give you a random answer out of templates in the trained way.
Gemini
It is long-awaited answer of Google to ChatGPT. It is “most capable” AI model ever, according to Google, which was trained on video, images, audio and text. Is available to developers through Google Cloud’s API since December 13, 2023. Google’s Bard, a chatbot similar to ChatGPT and suggested replies from the keyboard of Pixel 8 smartphones are already powered by Gemini Pro. It is expected to be introduced into main Google products in 2024.
Google says Gemini Ultra scores %90 higher than any other model including GPT-4, on the Massive Multitask Language Understanding (MMLU) benchmark. It was tested using a data set of toxic model prompts developed by the Allen Institute for AI.
Alexei Efros, a professor at UC Berkeley specialized in the visual capabilities of AI, says Google’s general approach with Gemini appears promising. However, he says that “…the problem with all these proprietary models, we don’t really know what’s inside”.
Now let’s look to the latest applications on Computer Vision field of AI.
StyleGAN
The Generative Adversarial Network (GAN), is a class of Machine Learning frameworks and a prominent model for approaching generative AI that pits one neural network against another.
StyleGAN causes two images to be generated and then combined by taking low-level features from one and high-level features from the other. A mixing regularization technique is used by the generator, causing some percentage of both to appear in the output image. And after each convolution layer, a noise on a per-pixel basis is added.
StyleGAN method works by gradually increasing the resolution, thus ensuring that the network evolves slowly, initially learning a simple problem before progressing to learning more complex problems. Instead of generating a single image they generate multiple ones, and this technique allows for styles or features to be dissociated from each other.
StyleGAN allows to train and reconstruct historical pictures.
Verification
Entrupy has developed an app to detect imitation and fake bags. It makes verification by checking many product traits like color, stitching, and leather patterns.
Video solutions
Dragonfruit, a SF based start-up has developed a stock-out solution by applying computer vision technology to monitor products on store shelves and send real-time alerts via email/text/integrations into user’s system.
Bard lets you chat to YouTube videos, while Microsoft Co-pilot is able to summarize the entire video. Neural Radiation Fields (NeRFs) allow detailed 3D scenes to be generated from a series of 2D images.
Pika labs has recently released a text-to-video offering:
The video-to-text advancements have potential to compete with video commentators:
Animate Anyone can turn pictures into videos.
Advancements in audio modality shows potential for Spotify to be flooded with AI music surpassing Bruno Mars and Taylor Swift.
ChatGPT’s ability to identify actions from camera pictures, uncovers how much data will be generated with AI-powered CCTV cameras — one of plenty reasons why AI needs to be regulated and EU is preparing 1st AI Law. Let’s look at it.
1st AI Law
On December 9, 2023, the Council and EU Parliament negotiators reached a provisional agreement on the EU AI Act as a global AI landscape that is ethical, safe, and trustworthy.
It is a risk based regulation, categorizing risks into 4 levels:
- Minimal or No Risks: The majority of AI systems with negligible risks can continue without regulation.
- Limited Risks: AI systems with manageable risks are subject to light transparency obligations to empower users with informed decision-making.
- High Risks: A broad spectrum of high-risk AI systems will be authorised but with stringent requirements and obligations to access the EU market.
- Unacceptable Risks: Systems containing deemed unacceptable risks, including cognitive manipulation, predictive policing, emotion recognition in workplaces and schools, social scoring, and certain remote biometric identification systems, will be banned, with limited exceptions.
Top skills for AI roles
Now let’s review the top requirements for current AI roles. It should give you final idea where you stand relatively to Artificial Intelligence. Experience in the area and University degree in Computer Science/AI/Mathematics are top qualifications asked for the roles.
The top skills wanted for top AI roles today:
Machine Learning Engineer
- Python/Java/Scala
- TensorFlow/PyTorch/PySpark/Scikit-learn
- Spark/Hadoop/SQL
- CI/CD, testing
Data Scientist
- Spark/SQL/Hadoop/Pig/Hive/MapReduce
- Python/Scala
- NumPy/Pandas/TensorFlow/PyTorch/PySpark/Scikit-learn
NLP Engineer
- NLTK/SpaCy/Gensim
- GPT3/ Llama/ T5/ BERT
- Transformers/PyTorch /JAX
- Python/Scala
Robotics Engineer
- ROS/ROS2
- Python/C++
References
[1] StyleGAN: Use machine learning to generate and customize realistic images
[2] AI-pocalypse Now
[3] Discover how ChatGPT is trained!