Level Up Your AI Strategy: From Models to Applications and Beyond*

Mikhail Chrestkha
10 min readJan 4, 2024

*Title generated by Gemini Pro on Vertex AI

image by author

I ended my last LinkedIn post of 2023 with “Don’t get overwhelmed with all the new [AI] research, tools, and launches. Start building a mental model of what capabilities & tools you actually need for your use cases.” I took my own advice and scanned my 2023 LinkedIn posts and identified the 5 key takeaways & themes I learned along the way.

  • Which layer of the AI tech stack is right for you?
  • LLMs vs. LLM Systems & Applications
  • Deploy AI applications, NOT just models
  • Where does your unique data come into play for GenAI?
  • Which type of AI models & tools does your use case and application need?

Full posts below for consolidated reading:

(Feb 23') Which layer of the AI tech stack is right for you?

https://www.linkedin.com/posts/mchrestkha_which-layer-of-the-ai-tech-stack-is-right-activity-7030801339534237697-CFvi

image by author

Which layer of the AI tech stack is right for you?

Gone are the days (for most of us) of needing to craft neural networks layer by layer and neuron by neuron on a scarce GPU server. The AI tools ecosystem continues to evolve at a fast pace providing higher level abstractions for ease of use and deployment. But with more choices comes more confusion and risk.

Choose too low and you may get stuck maintaining infrastructure and boilerplate code rather than innovating. Choose too high and you may get stuck with a generic solution that is hard to customize and doesn’t fully solve your use case. The perfect answer of course doesn’t exist and will often vary based on your use case, team skillset and data availability.

The diagram and list below provides an example spectrum of AI software, AI infrastructure, and AI users to consider when choosing your AI tech stack.

From bottom to top:
- AI researchers pushing the leading edge with lower level frameworks like PyTorch, TensorFlow, Keras Functional API, and JAX. They tend to want low level access via CLI to virtual machines and clusters to optimize and debug their code.
- Data Scientists & ML Engineers looking to accelerate experimentation, simplify code management, and ship code faster. They tend to work with a mix of off-the-shelf OSS models (KerasNLP, HuggingFace, TFHub) and higher level frameworks like SKLearn, XGBoost, Keras Sequential API, PyTorch Lightning, & Fastai. Instead of managing underlying infrastructure they prefer managed platforms to increase velocity of ideation to production deployment.
- Developers looking to embed AI capabilities into their applications with node.js, ruby, python, golang, etc. They tend to look for fully managed API endpoints that require a simple input and provide an AI predicted or generated response output. They tend to apply creativity to add AI-powered features into their product and applications.
- Business users looking for an e2e SaaS application to solve a business task or business process. They tend to look for AI assistance within a full GUI-based tool that requires no coding knowledge. Examples range from large ERP & CRM systems with AI capabilities (eg. SAP, Anaplan, SalesForce Einstein) to smaller narrow applications (eg. Jasper AI, FLYR Labs, Google Docs, Google Sheets)

Which layer do you play in? I have a soft spot for the green but am amazed by what you can now do in the top 2 layers and still wowed by the innovation happening at the bottom layer (JAX, PyTorch, TPUs, NVIDIA GPUs). As we pair Predictive AI capabilities with Generative AI capabilities it will be interesting to see the impact on each of these layers.

(Apr 23') LLMs vs. LLM Systems & Applications:

https://www.linkedin.com/posts/mchrestkha_coming-back-after-3-weeks-off-felt-like-i-activity-7047090541561319424-95s8/

image by author

Coming back after 3 weeks off felt like I was away for a year in AI. AI and specifically large language models (LLMs) are moving that fast! It’s easy to feel lost. Take a breath. Choose an area that excites you, learn the fundamentals, and get prompting!

I caught up with Mr. BigQuery & Data Cloud Joe Malone today and we talked about how to step back and get familiar with the bigger concepts rather than get overwhelmed by every new AI research paper, model, and tool hitting the news. Below is one approach that I’m using to calm my mind!

(1) What area excites me the most? Is it the (a) AI research, (b) the tooling to better harness the tech, or © the application of this tech to a particular industry or use case?

Currently for me it’s (b), the tooling, best practices and architectures to leverage this technology most effectively. This will allow me to help users, teams and companies with ©.

(2) Learn the bigger fundamentals and concepts. After my conversation with Joe Malone and a discussion with Rajesh Thallam Mukul Goyal & Druva Reddy Tiruvuru I quickly sketched out the image below for some of the themes. I realized there were 3 areas I want to keep a pulse on:

- the LLMs themselves with their natural language interface
- the data, systems, and pipelines we want to augment them with
- the new AI tools and concepts to get familiar with including prompting (zero shot, few shot), embeddings, vector databases, tuning pipelines

(3) Nothing beats getting your hands dirty. Play and prototype with any tool you have access to for a few hours a week. Last I spoke with Tom Frantzen he had built a trip itinerary planner (integrated with the google maps SDK) and a wine recommender in a few days using LLMs APIs.

-Chat services: Bard, ChatGPT, You, Bing
-Language APIs: GPT, Cohere, Anthropic, AI21, PaLM & Vertex LLMs
-Image APIs: MidJourney, Dalle
-OSS libraries: KerasCV, KerasNL, HuggingFace Transformers
-Any SaaS app that is built on these models: Replit, Jasper, Canva, etc.

So what am I going to try next week? I’ll be playing with LangChain (https://lnkd.in/gywDtsBB) while trying the approaches Valliappa Lakshmanan laid out in his great blog https://lnkd.in/gmimhGNP

(May 23') Deploy AI applications, NOT just models

https://www.linkedin.com/posts/mchrestkha_multi-task-foundation-machine-learning-activity-7057247082230808576-t_Q6

image by author

Multi-task Foundation (Machine Learning) Models are the magic behind new predictive and generative AI capabilities. Here’s a before and after look at why developers should be excited about the potential of Foundation Models.

Even in many “AI-first” applications last year, the ML models and the AI systems were a small part of an application. They required a mix of single-task ML models and heuristics for their AI system (eg. recommendation systems, search engines, image classifiers, fraud detectors, chat bots). As a developer you put in a lot of work into an ML model that may (1) never get put into the application (2) power a small feature or (3) best case support an AI system core to the application.

Today, one multi-task foundation model can power the core AI engine behind an entire application. Chaining together natural language prompts allows you to analyze, extract, troubleshoot, explain, summarize and generate data to create novel capabilities and experiences within your application. Not only can developers accelerate how they prototype, experiment and build new AI capabilities, but they also can make a much bigger impact on the broader application, if not build it fully themselves.

The picture on the left has been discouraging to me the last few years. I just haven’t had time to even try to build any projects outside of a few models in Jupyter notebooks.

It is still very early and we have a lot to learn about foundation models (especially when it comes to safety, privacy and copyright attribution) but even as I’ve moved to a non-technical role the potential of the picture on the right has re-energized me!

(Nov 23') Where does your unique data come into play for GenAI?

https://www.linkedin.com/posts/mchrestkha_weve-all-heard-some-form-of-your-data-is-activity-7121033737714233344-B5kc

image by a

We’ve all heard some form of ‘your data is your differentiator’ but in the world of off-the-shelf Generative AI models, where does your data drive the most value?

All of us from the Predictive AI era are used to the mindset that data is most important in building a train/validation/test modeling dataset early in the ML lifecycle. Once you trained the model you still had a lot to do in building an inference pipeline, but you generally knew the input data features and output format of your model endpoint.

Now you can just grab a model pre-trained by Meta, TII, OpenAI, Google, Anthropic, Cohere, Minstral AI, etc and start generating predictions. And these predictions can be large chunks of generated content! Where does my data actually add value in this new world?

With Generative AI your unique data is just as important (if not more) than Predictive AI but in different ways. Here are some thoughts and a diagram inspired by conversations with Vincent Ciaravino Lillian McNealus Jennifer Otitigbe and Alex Martin.

(1) Your unique data becomes more important during the serving step of the model lifecycle. This includes the fast evolving Retrieval Augmented Generation (RAG) design pattern where you retrieve (generally through embeddings and similarity search) and feed relevant data as input into the model to generate a response. Remy Welch put together a fun example of using RAG with Vertex AI’s Text Embeddings and PaLM language model on her own book! https://lnkd.in/gRdenmVq

(2) Higher quality curated data becomes very important to fine-tune these large multi-task models. Parameter Efficient Fine-Tuning (PEFT) is a fast evolving space which allows you to focus and re-direct these large models for your specific task (eg. classifying content based on your taxonomy, formatting the output for your needs). Here is a great tutorial from Nikita Namjoshi on supervised tuning on Vertex AI https://lnkd.in/g28xzt9z

(3) The most important data may be the data you don’t have today. This will require patiently & carefully building a pipeline to collect your predictions and a way to collect preferences or corrected outputs. This is the promise of RLHF tuning which is available to use on Vertex AI Platform from Sihui (May) Hu and Yixin (Bethany) Wang https://lnkd.in/gT4zzuEH

(4) Finally, if you believe you do have lots of unique data AND the appetite for some expensive R&D, you can explore using your own data to pre-train your models at the very beginning of the ML lifecycle. TPUs and NVIDIA GPUs on Vertex AI are available for those who want to take the plunge.

What are your thoughts? Feel free to leave a comment on how your AI workflows are shifting and where in the lifecycle your unique data is driving the most value.

(Dec 23') Which type of AI models & tools does your use case and application need?

https://www.linkedin.com/posts/mchrestkha_im-over-here-still-learning-about-tuning-activity-7140928955401838594-dfme

image by author

I’m over here still learning about tuning and RAG while everyone seems to have moved on to agents and multimodal?! What a time to be in the AI space!

Lillian McNealus asked our team to get some rest tonight after a fun & busy few months leading up to today’s Gemini (and more) launches but I had to end the day with a recap of my favorite announcements:

- Want to get started with Gemini in minutes with a gmail account & API key for FREE? Head over to https://ai.google.dev/ (Lauren Usui Nate Keating Manvinder Singh)
- Need enterprise scale or already a Google Cloud user? Test Gemini on Vertex AI for FREE til mid-January on Vertex AI Platform https://lnkd.in/gfxCm-wp with a brand new samples folder in our github repo https://lnkd.in/g_BKAEz7 (Keelin McDonell Dave Elliott Polong Lin Erwin Huizenga Nikita Namjoshi)
- Already using LangChain or LlamaIndex, Gemini examples are ready to try [LlamaIndex: https://lnkd.in/gBPMjtxH] [LangChain: https://lnkd.in/g7-ejKKQ] Abhi Kunté Mukul Goyal
- Need function calling to more confidently generate method requests from Gemini? Julia Wiesinger and Mark Goldenson got you covered https://lnkd.in/gu-Yf-4Y
- Want to maintain large model quality at the cost and latency of a smaller model for a specialized task? Sihui (May) Hu & Yixin (Bethany) Wang launched Distillation step by step as a tuning pipeline on Vertex AI https://lnkd.in/gTmxM9Er
- How about grounding your LLMs on your private Enterprise Data with citations without having to deal with embeddings & vectorDBs? Just point to a Google Cloud Storage bucket or other GCP data sources. Alan Blount Diem Vu and Lewis Liu have an easy button for you with a great demo and blog write up from Sascha Heyer https://lnkd.in/gDq6424R
- And most importantly Vertex AI’s continued investment in safety filters & scores, indemnity protection, data privacy, and enterprise security & compliance Neama Dadkhahnikoo Vincent Ciaravino

Two concluding thoughts:

(1) Don’t get overwhelmed with all the new research, tools, and launches. Start building a mental model of what capabilities & tools you actually need for your use cases. You don’t need Gemini to forecast inventory but Gemini can unlock new multimodal customer experiences. You probably don’t need RAG or grounding to generate creative content but you will need grounding for a chat assistant helping with instructions from company manuals.

(2) Start playing, testing and building. Even Nenshad Bardoliwalla finds time to use Gemini to describe the interior and exterior of a rental property for a listing using video, image and text input! https://lnkd.in/g6MxFbvF . Shri Chary Joy Bonggao, Dan Diasio and I have a new year’s resolution to shake the cobwebs off our hands-on-keyboard skills from our time as data tech consultants!

— — — — — — — — — — — — — — — —

Follow me on Linkedin as I post my learnings on AI going into 2024: https://www.linkedin.com/in/mchrestkha/

--

--