Introducing Gemma fashions in Keras

Posted by Martin Görner – Product Supervisor, Keras

The Keras group is comfortable to announce that Gemma, a household of light-weight, state-of-the artwork open fashions constructed from the identical analysis and know-how that we used to create the Gemini fashions, is now out there within the KerasNLP assortment. Because of Keras 3, Gemma runs on JAX, PyTorch and TensorFlow. With this launch, Keras can also be introducing a number of new options particularly designed for big language fashions: a brand new LoRA API (Low Rank Adaptation) and enormous scale model-parallel coaching capabilities.

If you wish to dive instantly into code samples, head right here:

Get began

Gemma fashions are available in moveable 2B and 7B parameter sizes, and ship important advances towards related open fashions, and even some bigger ones. For instance:

Gemma 7B scores a brand new best-in class 64.3% of appropriate solutions within the MMLU language understanding benchmark (vs. 62.5% for Mistral-7B and 54.8% for Llama2-13B)

Gemma provides +11 share factors to the GSM8K benchmark rating for grade-school math issues (46.4% for Gemma 7B vs. Mistral-7B 35.4%, Llama2-13B 28.7%)

and +6.1 share factors of appropriate solutions in HumanEval, a coding problem (32.3% for Gemma 7B, vs. Mistral 7B 26.2%, Llama2 13B 18.3%).

Gemma fashions are provided with a well-recognized KerasNLP API and a super-readable Keras implementation. You possibly can instantiate the mannequin with a single line of code:

gemma_lm = keras_nlp.fashions.GemmaCausalLM.from_preset("gemma_2b_en")

And run it instantly on a textual content immediate – sure, tokenization is built-in, though you possibly can simply cut up it out if wanted – learn the Keras NLP information to see how.

gemma_lm.generate("Keras is a", max_length=32)
> "Keras is a well-liked deep studying framework for neural networks..."

Strive it out right here: Get began with Gemma fashions

Tremendous-tuning Gemma Fashions with LoRA

Because of Keras 3, you possibly can select the backend on which you run the mannequin. Right here is how one can swap:

os.environ["KERAS_BACKEND"] = "jax"  # Or "tensorflow" or "torch".
import keras # import keras after having chosen the backend

Keras 3 comes with a number of new options particularly for big language fashions. Chief amongst them is a brand new LoRA API (Low Rank Adaptation) for parameter-efficient fine-tuning. Right here is how one can activate it:

gemma_lm.spine.enable_lora(rank=4)
# Observe: rank=4 replaces the weights matrix of related layers with the 
# product AxB of two matrices of rank 4, which reduces the quantity of 
# trainable parameters.

This single line drops the variety of trainable parameters from 2.5 billion to 1.3 million!

Strive it out right here: Tremendous-tune Gemma fashions with LoRA.

Tremendous-tuning Gemma fashions on a number of GPU/TPUs

Keras 3 additionally helps large-scale mannequin coaching and Gemma is the proper mannequin to strive it out. The brand new Keras distribution API presents data-parallel and model-parallel distributed coaching choices. The brand new API is supposed to be multi-backend however in the meanwhile, it’s carried out for the JAX backend solely, due to its confirmed scalability (Gemma fashions have been skilled with JAX).

To fine-tune the bigger Gemma 7B, a distributed setup is helpful, for instance a TPUv3 with 8 TPU cores that you may get without spending a dime on Kaggle, or an 8-GPU machine from Google Cloud. Right here is how one can configure the mannequin for distributed coaching, utilizing mannequin parallelism:

device_mesh = keras.distribution.DeviceMesh(
   (1, 8), # Mesh topology
   ["batch", "model"], # named mesh axes
   gadgets=keras.distribution.list_devices() # precise accelerators
)


# Mannequin config
layout_map = keras.distribution.LayoutMap(device_mesh)
layout_map["token_embedding/embeddings"] = (None, "mannequin")
layout_map["decoder_block.*attention.*(query|key|value).*kernel"] = (
   None, "mannequin", None)
layout_map["decoder_block.*attention_output.*kernel"] = (
   None, None, "mannequin")
layout_map["decoder_block.*ffw_gating.*kernel"] = ("mannequin", None)
layout_map["decoder_block.*ffw_linear.*kernel"] = (None, "mannequin")


# Set the mannequin config and load the mannequin
model_parallel = keras.distribution.ModelParallel(
   device_mesh, layout_map, batch_dim_name="batch")
keras.distribution.set_distribution(model_parallel)
gemma_lm = keras_nlp.fashions.GemmaCausalLM.from_preset("gemma_7b_en")
# Prepared: now you can prepare with mannequin.match() or generate textual content with generate()

What this code snippet does is ready up the 8 accelerators right into a 1 x 8 matrix the place the 2 dimensions are referred to as “batch” and “mannequin”. Mannequin weights are sharded on the “mannequin” dimension, right here cut up between the 8 accelerators, whereas knowledge batches usually are not partitioned because the “batch” dimension is 1.

Strive it out right here: Tremendous-tune Gemma fashions on a number of GPUs/TPUs.

What’s Subsequent

We’ll quickly be publishing a information exhibiting you how one can accurately partition a Transformer mannequin and write the 6 traces of partitioning setup above. It’s not very lengthy however it might not match on this put up.

You should have seen that layer partitionings are outlined by way of regexes on layer names. You possibly can examine layer names with this code snippet. We ran this to assemble the LayoutMap above.

# That is for the primary Transformer block solely,
# however all of them have the identical construction
tlayer = gemma_lm.spine.get_layer('decoder_block_0')
for variable in tlayer.weights:
 print(f'{variable.path:<58}  {str(variable.form):<16}')

Full GSPMD mannequin parallelism works right here with just some partitioning hints as a result of Keras passes these settings to the highly effective XLA compiler which figures out all the opposite particulars of the distributed computation.

We hope you’ll get pleasure from enjoying with Gemma fashions. Right here can also be an instruction-tuning tutorial that you just would possibly discover helpful. And by the way in which, if you wish to share your fine-tuned weights with the neighborhood, the Kaggle Mannequin Hub now helps user-tuned weights uploads. Head to the mannequin web page for Gemma fashions on Kaggle and see what others have already created!

Apple 20W USB-C Power Adapter (for iPhone, iPad & AirPods)

(81890)

₹1,599.00 (as of March 1, 2024 21:17 GMT +00:00 - )

iQOO Neo9 Pro 5G (Fiery Red, 12GB RAM, 256GB Storage) | Snapdragon 8 Gen 2 Processor | Supercomputing Chip Q1 | Flagship Level Sony IMX920 Camera

(81)

₹38,999.00 (as of March 1, 2024 21:17 GMT +00:00 - )

Logitech B170 Wireless Mouse, 2.4 GHz with USB Nano Receiver, Optical Tracking, 12-Months Battery Life, Ambidextrous, PC/Mac/Laptop - Black

(71830)

₹595.00 (as of March 1, 2024 21:17 GMT +00:00 - )

MI Power Bank 3i 20000mAh Lithium Polymer 18W Fast Power Delivery Charging | Input- Type C | Micro USB| Triple Output | Sandstone Black

(156002)

₹1,999.00 (as of March 1, 2024 21:17 GMT +00:00 - )

Lapster 24pcs Mix Spiral Charger Spiral Charger Cable Protectors for Wires Data Cable Saver Charging Cord Protective Cable Cover

(9933)

₹99.00 (as of March 1, 2024 21:17 GMT +00:00 - )

TP-Link AC750 Wifi Range Extender | Up to 750Mbps | Dual Band WiFi Extender, Repeater, Wifi Signal Booster, Access Point| Easy Set-Up | Extends Wifi to Smart Home & Alexa Devices (RE200)

(75293)

₹1,799.00 (as of March 1, 2024 21:17 GMT +00:00 - )

Dell MS116 Wired Optical Mouse, 1000DPI, LED Tracking, Scrolling Wheel, Plug and Play

(39232)

₹269.00 (as of March 1, 2024 21:17 GMT +00:00 - )

HP v236w USB 2.0 64GB Pen Drive, Metal, Silver

(79764)

₹429.00 (as of March 1, 2024 21:17 GMT +00:00 - )

Logitech M221 Wireless Mouse, Silent Buttons, 2.4 GHz with USB Mini Receiver, 1000 DPI Optical Tracking, 18-Month Battery Life, Ambidextrous PC/Mac/Laptop - Charcoal Grey

(39812)

₹799.00 (as of March 1, 2024 21:17 GMT +00:00 - )

USB C to Lightning Cable 1M [Apple MFi Certified] iPhone Fast Charger Cable USB-C Power Delivery Charging Cord for iPhone 14/13/12/12 PRO Max/12 Mini/11/11PRO/XS/Max/XR/X/8/8Plus/iPad

(71556)

₹699.00 (as of March 1, 2024 21:17 GMT +00:00 - )

SAMSUNG SSD T7 Portable External Solid State Drive 2TB, USB 3.2 Gen 2, Reliable Storage for Gaming, Students, Professionals, MU-PC2T0T/AM, Gray

(30793)

$159.99 (as of March 1, 2024 21:17 GMT +00:00 - )

SanDisk 2TB Extreme Portable SSD - Up to 1050MB/s, USB-C, USB 3.2 Gen 2, IP65 Water and Dust Resistance, Updated Firmware - External Solid State Drive - SDSSDE61-2T00-G25

(57587)

$155.95 (as of March 1, 2024 21:17 GMT +00:00 - )

ARCTIC MX-4 (incl. Spatula, 4 g) - Premium Performance Thermal Paste for all processors (CPU, GPU - PC, PS4, XBOX), very high thermal conductivity, long durability, safe application, CPU Thermal Paste

(58276)

$5.38 (as of March 1, 2024 21:17 GMT +00:00 - )

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

(24402)

$154.16 (as of March 1, 2024 21:17 GMT +00:00 - )

Crucial RAM 16GB DDR4 3200MHz CL22 (or 2933MHz or 2666MHz) Laptop Memory CT16G4SFRA32A

(41072)

$36.99 (as of March 1, 2024 21:17 GMT +00:00 - )

Introducing Gemma fashions in Keras

Get began

Tremendous-tuning Gemma Fashions with LoRA

Tremendous-tuning Gemma fashions on a number of GPU/TPUs

What’s Subsequent

Apple 20W USB-C Power Adapter (for iPhone, iPad & AirPods)

iQOO Neo9 Pro 5G (Fiery Red, 12GB RAM, 256GB Storage) | Snapdragon 8 Gen 2 Processor | Supercomputing Chip Q1 | Flagship Level Sony IMX920 Camera

Logitech B170 Wireless Mouse, 2.4 GHz with USB Nano Receiver, Optical Tracking, 12-Months Battery Life, Ambidextrous, PC/Mac/Laptop - Black

MI Power Bank 3i 20000mAh Lithium Polymer 18W Fast Power Delivery Charging | Input- Type C | Micro USB| Triple Output | Sandstone Black

Lapster 24pcs Mix Spiral Charger Spiral Charger Cable Protectors for Wires Data Cable Saver Charging Cord Protective Cable Cover

TP-Link AC750 Wifi Range Extender | Up to 750Mbps | Dual Band WiFi Extender, Repeater, Wifi Signal Booster, Access Point| Easy Set-Up | Extends Wifi to Smart Home & Alexa Devices (RE200)

Dell MS116 Wired Optical Mouse, 1000DPI, LED Tracking, Scrolling Wheel, Plug and Play

HP v236w USB 2.0 64GB Pen Drive, Metal, Silver

Logitech M221 Wireless Mouse, Silent Buttons, 2.4 GHz with USB Mini Receiver, 1000 DPI Optical Tracking, 18-Month Battery Life, Ambidextrous PC/Mac/Laptop - Charcoal Grey

USB C to Lightning Cable 1M [Apple MFi Certified] iPhone Fast Charger Cable USB-C Power Delivery Charging Cord for iPhone 14/13/12/12 PRO Max/12 Mini/11/11PRO/XS/Max/XR/X/8/8Plus/iPad

SAMSUNG SSD T7 Portable External Solid State Drive 2TB, USB 3.2 Gen 2, Reliable Storage for Gaming, Students, Professionals, MU-PC2T0T/AM, Gray

SanDisk 2TB Extreme Portable SSD - Up to 1050MB/s, USB-C, USB 3.2 Gen 2, IP65 Water and Dust Resistance, Updated Firmware - External Solid State Drive - SDSSDE61-2T00-G25

ARCTIC MX-4 (incl. Spatula, 4 g) - Premium Performance Thermal Paste for all processors (CPU, GPU - PC, PS4, XBOX), very high thermal conductivity, long durability, safe application, CPU Thermal Paste

AMD Ryzen 5 5600X 6-core, 12-Thread Unlocked Desktop Processor with Wraith Stealth Cooler

Crucial RAM 16GB DDR4 3200MHz CL22 (or 2933MHz or 2666MHz) Laptop Memory CT16G4SFRA32A

Extra Put on OS watch faces will use much less energy sooner or later

Karine Perset helps governments perceive AI

HomeKit Weekly: Checking in on the Eve Cam – is it the highest indoor HomeKit Digicam?

Unreal Match and the unique Unreal obtain RTX path tracing mods

Extra Put on OS watch faces will use much less energy sooner or later

Karine Perset helps governments perceive AI

HomeKit Weekly: Checking in on the Eve Cam – is it the highest indoor HomeKit Digicam?

Unreal Match and the unique Unreal obtain RTX path tracing mods

LEAVE A REPLY Cancel reply

Editor Picks

Karine Perset helps governments perceive AI

HomeKit Weekly: Checking in on the Eve Cam – is it the highest indoor HomeKit Digicam?

Unreal Match and the unique Unreal obtain RTX path tracing mods

Must read

Karine Perset helps governments perceive AI

HomeKit Weekly: Checking in on the Eve Cam – is it the highest indoor HomeKit Digicam?

Unreal Match and the unique Unreal obtain RTX path tracing mods

Popular categories