Maybe Convert Wasm Extension Config?

Maybe Convert Wasm Extension Config?

Have you ever said "THIS WORKED YESTERDAY!!". Yeah, me too. This could also be titled Living on the edge with Wasm or The tale of a stray character or How I spent way more time I would like to admit, fixing an issue that was completely my fault

Remember the last time you said to yourself: "But this worked yesterday!!!".
For me, that was today.
If you're in tech, chances are you've said something like this before. The stuff was working just fine, you come back the next day, and nothing works anymore. Everything seems the same as you left it, yet stuff is broken.
I've been in tech for a long time, and one thing that I've learned when troubleshooting issues is to always check the most basic/obvious thing first. That makes a lot of sense if you think about it. If a light doesn't turn on, you'll first check if there's power. You're not going to start cutting into drywall to see if all wires are intact.
You can apply the same approach when you're debugging or trying to get something to work the way it should work.
Anyway, onto my story. I've been spending a good portion of my time researching and learning about Wasm extensions and Envoy.
Specifically, I worked on a Wasm HTTP filter that I wanted to run as part of the Envoy filter chain in my Istio mesh (running on Kubernetes). (I think I hit a cloud-native bingo in that previous sentence).
Creating, building, and "publishing" a Wasm extension was reasonably straightforward, thanks to the GetEnvoy CLI.
To get the context let me tell you what you need to do to get your Wasm extension deployed to Istio after you've built and published the extension.
Since the whole world runs on YAML, so do Istio and Kubernetes. I've created two EnvoyFilter resources.
The first one is downloading the Wasm extension from a remote URI (this is an excellent feature btw!)
The second one "inserts" the downloaded Wasm into Envoy's filter chain
Here are the relevant snippets for both so that you get an idea. This first snippet is the one that configures the extension. The relevant portions are the name (hello-wasm) and the uri where the .wasm file lives:
---
configPatches:
  - applyTo: EXTENSION_CONFIG
    match:
      context: SIDECAR_INBOUND
    patch:
      operation: ADD
      value:
        name: hello-wasm
        typed_config:
          '@type': type.googleapis.com/udpa.type.v1.TypedStruct
          type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
          value:
            config:
              vm_config:
                vm_id: hello-wasm-vm
                runtime: envoy.wasm.runtime.v8
                code:
                  remote:
                    http_uri:
                      uri: https://extensionstorageaccount/extension.wasm
The second EnvoyFilter is the one that inserts the downloaded Wasm extension into Envoys' filter chain. The relevant portion here is the name field (hello-wasm) - that's how Envoy knows which extension I am talking about.
---
patch:
  operation: INSERT_BEFORE
  value:
    name: hello-wasm
    config_discovery:
      config_source:
        ads: {}
        initial_fetch_timeout: 0s # wait indefinitely to prevent bad Wasm fetch
      type_urls:
        ['type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm']
Armed with a sample extension and YAML, I deployed everything to a Kubernetes cluster running Istio 1.9.1. All worked great. (I ran into some caching issues where Envoy was "magically" loading the Wasm from somewhere, but that's a story for another time).
Anyway, the next day I wanted to dig more into the caching issues I was seeing. Since I wasn't getting too far - i.e. I couldn't pinpoint what or where the root cause is, I did the equivalent of "turn your computer on and off again." Get a clean Kubernetes cluster with clean Istio 1.9.1 installation. Instead of wasting time with potentially 'broken' cluster/configuration, I figured it would make the most sense to start clean.
I kept notes from the previous day, and I thought it would be minutes before I have the extension deployed and running again. Boy, was I wrong.
I deployed the YAML and then the Envoy next to the test workload was failing to start. Specifically, this was the error from the logs:
goroutine 136 [running]:
istio.io/istio/pkg/wasm.convert(0xc000af34a0, 0x1c0c6e0, 0xc0007c2300, 0xc000af34a0, 0x27c4500)
        istio.io/istio/pkg/wasm/convert.go:82 +0xef8
istio.io/istio/pkg/wasm.MaybeConvertWasmExtensionConfig.func2(0xc000b12bf0, 0xc00000e440, 0x1, 0x1, 0x1c0c6e0, 0xc0007c2300, 0xc000b12bd8, 0x0)
        istio.io/istio/pkg/wasm/convert.go:52 +0x85
created by istio.io/istio/pkg/wasm.MaybeConvertWasmExtensionConfig
        istio.io/istio/pkg/wasm/convert.go:49 +0x154
I deleted the Pod, re-applied the filter YAML - the same thing... Google to the resource - not many results, unfortunately.
Next, I checked the source code and poked around a bit to make sure if the module is even being downloaded or not. Without digging more into the source code, my assumption at this point was that the module was not being downloaded (i.e. the error was happening before that).
Then I went to check the docs - Wasm in Istio being 'experimental' and all. I didn't remember if there was a different profile I was supposed to use or if there was a magical config setting somewhere. There's no magical setting, and the demo profile should work just fine.
Even though I thought the module was not even being downloaded, I figured I should probably re-create a new, empty Wasm extension and try with that. Well, SAME DARN THING!!!!!11 Same error - doesn't work.
Time for istioctl analyze - no validation errors. All looks good.
Finally, I used an example YAML and Wasm extension from Istio.io. Lo and behold - that one worked! That meant it's probably not my Istio installation, and it could either be my Wasm file or EnvoyFilter configuration.
I replaced the URI in the working EnvoyFilter YAML with the URI of the extension I built. That worked! So it's not the extension either. It was (surprise, surprise) YAML...
From here on out, it was pretty straightforward. I did a diff between the working YAML and the broken YAML to see what the differences are.
Here's the diff between the working YAML (left) and broken YAML (right). Can you spot the issue?
YAML diff
YAML diff
Is it my fault? For sure. Could Istio CLI's analyze command catch this? Probably. Could there have been a better error message? Yep. However, in the end, it was still me who messed up.
Follow me on Twitter for more embarrassing stories and other tech content.

Related Posts

;