Voice Canvas - AI Art Director - XeLseD

This sketch creates an interactive AI-powered art generator that listens to voice commands via the Web Speech API, sends descriptions to OpenAI's GPT model to generate visual parameters, and renders abstract animations (waves, particles, circles, or lines) based on the AI's interpretation. The sketch also uses OpenAI's text-to-speech API to speak back descriptions of the generated visuals.

๐ŸŽ“ Concepts You'll Learn

Web Speech API for voice inputAsynchronous API calls (fetch)JSON parsing and validationProcedural animation with trigonometric functionsPerlin noise for organic motionDOM manipulation and event handlingAPI key management and securityState management with flagsCanvas drawing with shapes and transformsAudio playback with p5.sound

๐Ÿ”„ Code Flow

Code flow showing preload, setup, draw, startlistening, interpretspeech, speakback, drawwaves, drawparticles, drawcircles, drawlines, windowresized, getapikey

๐Ÿ’ก Click on function names in the diagram to jump to their code

graph TD start[Start] --> setup[setup] setup --> canvas-setup[Canvas Creation] setup --> dom-selection[DOM Element Selection] setup --> speech-api-check[Web Speech API Availability Check] setup --> speech-config[Speech Recognition Configuration] setup --> onstart-handler[Speech Start Handler] setup --> onresult-handler[Speech Result Handler] setup --> onerror-handler[Speech Error Handler] setup --> onend-handler[Speech End Handler] setup --> draw[draw loop] click setup href "#fn-setup" click canvas-setup href "#sub-canvas-setup" click dom-selection href "#sub-dom-selection" click speech-api-check href "#sub-speech-api-check" click speech-config href "#sub-speech-config" click onstart-handler href "#sub-onstart-handler" click onresult-handler href "#sub-onresult-handler" click onerror-handler href "#sub-onerror-handler" click onend-handler href "#sub-onend-handler" draw --> background-clear[Background Clear] draw --> visual-switch[Visual Type Switch] visual-switch --> drawwaves[drawwaves] visual-switch --> drawparticles[drawparticles] visual-switch --> drawcircles[drawcircles] visual-switch --> drawlines[drawlines] draw --> draw[draw loop] click draw href "#fn-draw" click background-clear href "#sub-background-clear" click visual-switch href "#sub-visual-switch" click drawwaves href "#fn-drawwaves" click drawparticles href "#fn-drawparticles" click drawcircles href "#fn-drawcircles" click drawlines href "#fn-drawlines" drawwaves --> wave-loop[Wave Vertex Generation Loop] wave-loop --> sine-calculation[Sine Wave Calculation] click wave-loop href "#sub-wave-loop" click sine-calculation href "#sub-sine-calculation" drawparticles --> particle-loop[Particle Generation Loop] particle-loop --> perlin-noise[Perlin Noise Position Calculation] click particle-loop href "#sub-particle-loop" click perlin-noise href "#sub-perlin-noise" drawcircles --> circle-loop[Concentric Circle Loop] click circle-loop href "#sub-circle-loop" drawlines --> outer-line-loop[Horizontal Line Loop] outer-line-loop --> inner-vertex-loop[Vertex Generation Loop] inner-vertex-loop --> sine-offset[Sine Wave Offset] click outer-line-loop href "#sub-outer-line-loop" click inner-vertex-loop href "#sub-inner-vertex-loop" click sine-offset href "#sub-sine-offset" startlistening --> recognition-check[Recognition Availability Check] recognition-check -->|if available| onstart-handler recognition-check -->|if not available| onerror-handler interpretspeech --> fetch-openai[OpenAI API Request] fetch-openai --> response-check[Response Status Check] response-check -->|if successful| json-parsing[JSON Parsing] json-parsing --> parameter-validation[Parameter Validation and Sanitization] parameter-validation --> speakback[speakback] click fetch-openai href "#sub-fetch-openai" click response-check href "#sub-response-check" click json-parsing href "#sub-json-parsing" click parameter-validation href "#sub-parameter-validation" speakback --> speaking-flag-check[Speaking Flag Check] speaking-flag-check -->|if not speaking| tts-api-call[OpenAI Text-to-Speech API Request] tts-api-call --> audio-blob-handling[Audio Blob Processing] audio-blob-handling --> sound-playback[Sound Playback with Callbacks] sound-playback --> sound-ended-callback[Sound Ended Handler] click speaking-flag-check href "#sub-speaking-flag-check" click tts-api-call href "#sub-tts-api-call" click audio-blob-handling href "#sub-audio-blob-handling" click sound-playback href "#sub-sound-playback" click sound-ended-callback href "#sub-sound-ended-callback" windowresized --> setup click windowresized href "#fn-windowresized" getapikey --> base64-decode[Base64 Decoding] base64-decode --> xor-decryption[XOR Decryption] click getapikey href "#fn-getapikey" click base64-decode href "#sub-base64-decode" click xor-decryption href "#sub-xor-decryption"

๐Ÿ“ Code Breakdown

preload()

preload() runs before setup() and is the perfect place to decode sensitive data like API keys. This ensures the key is ready before any API calls are made.

function preload() {
  // Decode the API key before setup
  openaiApiKey = getApiKey();
  console.log("OpenAI API Key decoded.");
}

Line by Line:

openaiApiKey = getApiKey();
Calls the getApiKey() function to decode the XOR-encoded API key and stores it in the global variable for later use in API calls
console.log("OpenAI API Key decoded.");
Logs a message to the browser console confirming the API key was successfully decoded

setup()

setup() initializes the canvas, DOM elements, and Web Speech API. The Web Speech API requires browser support and microphone permissions. Event handlers (onstart, onresult, onerror, onend) manage the speech recognition lifecycle and update the UI accordingly.

function setup() {
  createCanvas(windowWidth, windowHeight);
  background(0); // Dark background
  noStroke(); // Default to no stroke for most visuals

  // Create and position microphone button
  micButton = select('#micButton');
  micButton.mousePressed(startListening);

  // Create and position transcript div
  transcriptDiv = select('#transcriptDiv');
  transcriptDiv.html(transcriptText);

  // Initialize Web Speech API
  if ('webkitSpeechRecognition' in window) {
    recognition = new webkitSpeechRecognition();
    recognition.continuous = false; // Listen for a single phrase
    recognition.interimResults = false; // Only get final results
    recognition.lang = 'en-US'; // Set language

    recognition.onstart = function() {
      micButton.html('Listening...');
      micButton.attribute('disabled', ''); // Disable button while listening
      transcriptText = "Listening...";
      transcriptDiv.html(transcriptText);
      console.log("Speech recognition started.");
    };

    recognition.onresult = function(event) {
      let finalTranscript = '';
      for (let i = event.resultIndex; i < event.results.length; ++i) {
        if (event.results[i].isFinal) {
          finalTranscript += event.results[i][0].transcript;
        }
      }
      transcriptText = "You said: " + finalTranscript;
      transcriptDiv.html(transcriptText);
      console.log("Speech recognized: " + finalTranscript);
      interpretSpeech(finalTranscript);
    };

    recognition.onerror = function(event) {
      console.error("Speech recognition error: ", event.error);
      transcriptText = "Error: " + event.error;
      transcriptDiv.html(transcriptText);
      micButton.html('Start Listening');
      micButton.removeAttribute('disabled');
      if (event.error === 'no-speech') {
        transcriptText = "No speech detected. Please try again.";
      } else if (event.error === 'not-allowed') {
        transcriptText = "Microphone access denied. Please allow access in your browser settings.";
      }
      transcriptDiv.html(transcriptText);
    };

    recognition.onend = function() {
      micButton.html('Start Listening');
      micButton.removeAttribute('disabled');
      console.log("Speech recognition ended.");
      // Only reset transcript if an OpenAI call hasn't started yet
      // or if it was an error before OpenAI could be called.
      if (!speaking && transcriptText.startsWith("You said:")) {
        transcriptText = "Processing speech...";
        transcriptDiv.html(transcriptText);
      }
    };
  } else {
    transcriptText = "Web Speech API (webkitSpeechRecognition) not supported in this browser. Please use Chrome.";
    transcriptDiv.html(transcriptText);
    micButton.attribute('disabled', '');
    console.warn(transcriptText);
  }
}

๐Ÿ”ง Subcomponents:

initialization Canvas Creation createCanvas(windowWidth, windowHeight);

Creates a full-screen canvas that fills the entire browser window

dom-manipulation DOM Element Selection micButton = select('#micButton'); transcriptDiv = select('#transcriptDiv');

Selects HTML elements from the DOM so they can be controlled by p5.js

conditional Web Speech API Availability Check if ('webkitSpeechRecognition' in window)

Checks if the browser supports the Web Speech API before attempting to use it

configuration Speech Recognition Configuration recognition.continuous = false; recognition.interimResults = false; recognition.lang = 'en-US';

Sets up the speech recognition to listen for single phrases in English without interim results

event-handler Speech Start Handler recognition.onstart = function() { ... }

Updates UI when speech recognition begins listening

event-handler Speech Result Handler recognition.onresult = function(event) { ... }

Processes the final recognized speech and sends it to OpenAI for interpretation

event-handler Speech Error Handler recognition.onerror = function(event) { ... }

Handles speech recognition errors and displays appropriate messages to the user

event-handler Speech End Handler recognition.onend = function() { ... }

Re-enables the microphone button and updates UI when speech recognition stops

Line by Line:

createCanvas(windowWidth, windowHeight);
Creates a p5.js canvas that fills the entire browser window, allowing full-screen art generation
background(0);
Sets the initial canvas background to black (0 in grayscale), creating a dark canvas for the visuals
noStroke();
Disables stroke (outline) by default for shapes, so only filled shapes are drawn initially
micButton = select('#micButton');
Selects the HTML button element with id 'micButton' so it can be controlled programmatically
micButton.mousePressed(startListening);
Attaches a click event listener to the microphone button that calls startListening() when clicked
transcriptDiv = select('#transcriptDiv');
Selects the HTML div element with id 'transcriptDiv' to display status messages and speech recognition results
if ('webkitSpeechRecognition' in window)
Checks if the browser supports the Web Speech API (primarily Chrome/Edge); if not, shows an error message
recognition = new webkitSpeechRecognition();
Creates a new instance of the Web Speech API recognition object for processing voice input
recognition.continuous = false;
Configures the recognizer to stop listening after detecting one phrase, rather than listening continuously
recognition.interimResults = false;
Tells the recognizer to only return final, confirmed results, not intermediate guesses while the user is still speaking
recognition.lang = 'en-US';
Sets the language for speech recognition to US English
micButton.html('Listening...');
Changes the button text to 'Listening...' to provide visual feedback that the microphone is active
micButton.attribute('disabled', '');
Disables the button while listening to prevent multiple simultaneous speech recognition sessions
for (let i = event.resultIndex; i < event.results.length; ++i)
Loops through the speech recognition results starting from the last unprocessed result to get the final transcript
if (event.results[i].isFinal)
Checks if the current result is final (confirmed) rather than interim, ensuring we only use confirmed speech
interpretSpeech(finalTranscript);
Calls the interpretSpeech() function with the recognized text to send it to OpenAI for processing
if (event.error === 'no-speech')
Checks if the error was due to no speech being detected and displays an appropriate message
if (event.error === 'not-allowed')
Checks if the error was due to microphone permission being denied and instructs the user to enable it
if (!speaking && transcriptText.startsWith("You said:"))
Only updates the transcript if no speech synthesis is in progress and the user has spoken, preventing UI conflicts

draw()

draw() runs 60 times per second and is the main animation loop. The switch statement allows different visual styles to be rendered based on the AI's interpretation. Each case calls a different drawing function that uses frameCount to create smooth animations.

function draw() {
  background(0); // Clear background each frame

  // Render art based on current parameters
  switch (visualParams.visualType) {
    case 'waves':
      drawWaves();
      break;
    case 'particles':
      drawParticles();
      break;
    case 'circles':
      drawCircles();
      break;
    case 'lines':
      drawLines();
      break;
    default:
      drawWaves(); // Fallback
      break;
  }
}

๐Ÿ”ง Subcomponents:

drawing Background Clear background(0);

Clears the canvas with black each frame, creating a fresh slate for the next animation frame

switch-case Visual Type Switch switch (visualParams.visualType) { ... }

Routes to the appropriate drawing function based on the visual type selected by OpenAI

Line by Line:

background(0);
Clears the entire canvas to black each frame, preventing trails and creating a clean animation loop
switch (visualParams.visualType)
Evaluates the visualType property to determine which drawing function to call
case 'waves': drawWaves(); break;
If visualType is 'waves', calls the drawWaves() function and exits the switch statement
case 'particles': drawParticles(); break;
If visualType is 'particles', calls the drawParticles() function to render particle animations
case 'circles': drawCircles(); break;
If visualType is 'circles', calls the drawCircles() function to render concentric circles
case 'lines': drawLines(); break;
If visualType is 'lines', calls the drawLines() function to render animated wavy lines
default: drawWaves();
If visualType doesn't match any case, defaults to drawing waves as a fallback

startListening()

startListening() is called when the user clicks the microphone button. It safely starts the Web Speech API, triggering the onstart event handler in setup() which updates the UI to show 'Listening...'

function startListening() {
  if (recognition) {
    recognition.start();
  }
}

๐Ÿ”ง Subcomponents:

conditional Recognition Availability Check if (recognition)

Ensures the Web Speech API is available before attempting to start listening

Line by Line:

if (recognition)
Checks if the recognition object was successfully created in setup(), preventing errors on unsupported browsers
recognition.start();
Activates the microphone and begins listening for speech input from the user

interpretSpeech(speechText)

interpretSpeech() is an async function that communicates with OpenAI's API. It sends the user's voice description as a prompt, receives JSON parameters back, validates them to ensure safety, and then calls speakBack() to provide audio feedback. The validation step is crucial to prevent invalid data from breaking the visualization.

async function interpretSpeech(speechText) {
  transcriptText = "Asking OpenAI...";
  transcriptDiv.html(transcriptText);

  const prompt = `You are an art generator assistant. When given a description, generate parameters for a visual art piece. Return the response as a JSON object with the following keys: \`visualType\` (string, one of 'waves', 'particles', 'circles', 'lines'), \`color\` (string, hex color code, e.g., '#RRGGBB'), \`speed\` (number, 0.1 to 5), \`count\` (integer, 10 to 200). Ensure the color is a valid hex code. For example, if the description is 'red flowing lines', you might return \`{\"visualType\": \"lines\", \"color\": \"#FF0000\", \"speed\": 2, \"count\": 100}\`. If the description is 'blue starry background', you might return \`{\"visualType\": \"particles\", \"color\": \"#0000FF\", \"speed\": 0.5, \"count\": 150}\`. If the description is 'slow green circles', you might return \`{\"visualType\": \"circles\", \"color\": \"#00FF00\", \"speed\": 0.2, \"count\": 80}\`. If the description is 'fast vibrant waves', you might return \`{\"visualType\": \"waves\", \"color\": \"#FF00FF\", \"speed\": 4, \"count\": 70}\`.
Now, generate parameters for: \"${speechText}\"`;;

  try {
    const response = await fetch('https://api.openai.com/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${openaiApiKey}`
      },
      body: JSON.stringify({
        model: 'gpt-3.5-turbo', // Or 'gpt-4' if available and preferred
        messages: [{
          role: 'user',
          content: prompt
        }],
        response_format: { type: "json_object" }, // Crucial for getting JSON output
        temperature: 0.7, // Creativity level
        max_tokens: 150 // Max tokens for the response
      })
    });

    if (!response.ok) {
      const errorData = await response.json();
      throw new Error(`OpenAI API error: ${response.status} - ${errorData.error.message || 'Unknown error'}`);
    }

    const data = await response.json();
    console.log("OpenAI raw response:", data);

    const openAIResponseContent = data.choices[0].message.content;
    console.log("OpenAI content:", openAIResponseContent);

    // Attempt to parse the JSON string
    let parsedParams;
    try {
      parsedParams = JSON.parse(openAIResponseContent);
    } catch (e) {
      console.error("Failed to parse JSON from OpenAI:", e);
      transcriptText = "OpenAI returned invalid JSON. Trying again...";
      transcriptDiv.html(transcriptText);
      // Fallback or retry
      return;
    }

    // Validate and sanitize parameters
    visualParams.visualType = ['waves', 'particles', 'circles', 'lines'].includes(parsedParams.visualType) ? parsedParams.visualType : 'waves';
    visualParams.color = /^#([0-9A-Fa-f]{3}){1,2}$/.test(parsedParams.color) ? parsedParams.color : defaultColors[floor(random(defaultColors.length))]; // Validate hex color
    visualParams.speed = constrain(parsedParams.speed, 0.1, 5);
    visualParams.count = constrain(floor(parsedParams.count), 10, 200);

    transcriptText = `OpenAI says: Visual Type: ${visualParams.visualType}, Color: ${visualParams.color}, Speed: ${visualParams.speed}, Count: ${visualParams.count}`;
    transcriptDiv.html(transcriptText);
    console.log("Updated visual parameters:", visualParams);

    speakBack(visualParams); // Now calls the OpenAI TTS function

  } catch (error) {
    console.error("Error communicating with OpenAI:", error);
    transcriptText = "Error: Could not get art parameters from OpenAI. Please check your API key and try again. " + error.message;
    transcriptDiv.html(transcriptText);
    micButton.removeAttribute('disabled'); // Re-enable button on error
  }
}

๐Ÿ”ง Subcomponents:

async-api-call OpenAI API Request const response = await fetch('https://api.openai.com/v1/chat/completions', { ... });

Sends the user's speech description to OpenAI's GPT model to get visual parameters

conditional Response Status Check if (!response.ok) { throw new Error(...); }

Verifies the API request was successful before processing the response

try-catch JSON Parsing try { parsedParams = JSON.parse(openAIResponseContent); } catch (e) { ... }

Safely parses the JSON response from OpenAI with error handling

validation Parameter Validation and Sanitization visualParams.visualType = ['waves', 'particles', 'circles', 'lines'].includes(...) ? ... : 'waves';

Ensures all parameters are valid and within acceptable ranges before using them

Line by Line:

transcriptText = "Asking OpenAI...";
Updates the display to show the user that the sketch is communicating with OpenAI
const prompt = `You are an art generator assistant...`;
Creates a detailed prompt that tells OpenAI how to interpret the user's description and what format to return
const response = await fetch('https://api.openai.com/v1/chat/completions', { ... });
Sends a POST request to OpenAI's API with the prompt, waiting for the response (async/await)
method: 'POST',
Specifies that this is a POST request, which is required for sending data to the OpenAI API
'Authorization': `Bearer ${openaiApiKey}`
Includes the decoded API key in the authorization header to authenticate the request
model: 'gpt-3.5-turbo',
Specifies which OpenAI model to use (gpt-3.5-turbo is faster and cheaper than gpt-4)
response_format: { type: "json_object" },
Tells OpenAI to return the response as valid JSON, ensuring structured output we can parse
temperature: 0.7,
Controls creativity (0-1 scale): 0.7 provides a balance between consistent and creative responses
if (!response.ok) { throw new Error(...); }
Checks if the HTTP response was successful; if not, throws an error with details
const data = await response.json();
Parses the response body as JSON, extracting the OpenAI's generated parameters
const openAIResponseContent = data.choices[0].message.content;
Extracts the actual text response from the nested JSON structure returned by OpenAI
parsedParams = JSON.parse(openAIResponseContent);
Converts the JSON string from OpenAI into a JavaScript object so we can access its properties
visualParams.visualType = ['waves', 'particles', 'circles', 'lines'].includes(parsedParams.visualType) ? parsedParams.visualType : 'waves';
Validates that the visualType is one of the allowed options; if not, defaults to 'waves'
visualParams.color = /^#([0-9A-Fa-f]{3}){1,2}$/.test(parsedParams.color) ? parsedParams.color : defaultColors[floor(random(defaultColors.length))];
Validates the color is a valid hex code using regex; if invalid, picks a random color from defaultColors
visualParams.speed = constrain(parsedParams.speed, 0.1, 5);
Ensures speed is between 0.1 and 5, clamping values outside this range to the nearest limit
visualParams.count = constrain(floor(parsedParams.count), 10, 200);
Converts count to an integer and ensures it's between 10 and 200 elements
speakBack(visualParams);
Calls the speakBack() function to use OpenAI's text-to-speech to describe the generated art
catch (error) { ... }
Catches any errors during the API call and displays them to the user with helpful messages

speakBack(params)

speakBack() uses OpenAI's text-to-speech API to provide audio feedback about the generated art. It includes a speaking flag to prevent overlapping audio, loads the audio blob as a URL, and uses p5.sound's loadSound() to play it. The onended() callback ensures proper cleanup of resources.

async function speakBack(params) {
  if (speaking) {
    console.log("Speech synthesis is already in progress.");
    return;
  }

  const textToSpeak = `Generating art with visual type ${params.visualType}, color ${params.color}, speed ${params.speed}, and count ${params.count}.`;
  console.log("Speaking back art description using OpenAI TTS...");

  speaking = true; // Set flag to prevent overlapping

  try {
    const response = await fetch('https://api.openai.com/v1/audio/speech', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Authorization': `Bearer ${openaiApiKey}`
      },
      body: JSON.stringify({
        model: 'tts-1', // Choose a model: 'tts-1' or 'tts-1-hd' for higher quality
        input: textToSpeak,
        voice: 'nova', // Choose a voice: 'alloy', 'echo', 'fable', 'onyx', 'nova', 'shimmer'
        response_format: 'mp3', // Audio format
        speed: 1.0 // Speed of speech (0.25 to 4.0)
      })
    });

    if (!response.ok) {
      const errorData = await response.json();
      throw new Error(`OpenAI TTS API error: ${response.status} - ${errorData.error.message || 'Unknown error'}`);
    }

    // Get the audio data as a Blob
    const audioBlob = await response.blob();
    // Create a temporary URL for the audio Blob
    const audioUrl = URL.createObjectURL(audioBlob);

    // Play the audio using p5.sound's loadSound and play functions
    const sound = loadSound(audioUrl, () => {
      sound.play();
      console.log("OpenAI TTS audio started playing.");
    }, (error) => {
      // Callback for when loadSound fails
      console.error("p5.sound loadSound error:", error);
      speaking = false; // Reset flag on error
      URL.revokeObjectURL(audioUrl); // Clean up the temporary URL
    });

    // Set a callback for when the sound finishes playing
    sound.onended(() => {
      speaking = false; // Reset flag when speech is done
      console.log("OpenAI TTS audio finished playing.");
      URL.revokeObjectURL(audioUrl); // Clean up the temporary URL after playback
    });

  } catch (error) {
    console.error("Error communicating with OpenAI TTS:", error);
    transcriptText = "Error: Could not speak back art parameters from OpenAI. " + error.message;
    transcriptDiv.html(transcriptText);
    speaking = false; // Reset flag on error
  }
}

๐Ÿ”ง Subcomponents:

conditional Speaking Flag Check if (speaking) { console.log(...); return; }

Prevents overlapping speech synthesis by checking if audio is already playing

async-api-call OpenAI Text-to-Speech API Request const response = await fetch('https://api.openai.com/v1/audio/speech', { ... });

Sends text to OpenAI's TTS API to generate audio description of the art

blob-processing Audio Blob Processing const audioBlob = await response.blob(); const audioUrl = URL.createObjectURL(audioBlob);

Converts the audio response to a playable URL that p5.sound can load

audio-playback Sound Playback with Callbacks const sound = loadSound(audioUrl, () => { sound.play(); }, (error) => { ... });

Loads and plays the audio, with callbacks for success and error states

event-handler Sound Ended Handler sound.onended(() => { speaking = false; ... });

Resets the speaking flag and cleans up resources when audio finishes playing

Line by Line:

if (speaking) { console.log(...); return; }
Checks the speaking flag to prevent multiple simultaneous audio playbacks; exits early if already speaking
const textToSpeak = `Generating art with visual type...`;
Creates a human-readable description of the generated art parameters to be spoken aloud
speaking = true;
Sets the speaking flag to true to indicate audio synthesis is in progress
const response = await fetch('https://api.openai.com/v1/audio/speech', { ... });
Sends a request to OpenAI's text-to-speech API with the art description
model: 'tts-1',
Uses the tts-1 model for faster synthesis (tts-1-hd is available for higher quality but slower)
voice: 'nova',
Selects the 'nova' voice for speech synthesis (other options: 'alloy', 'echo', 'fable', 'onyx', 'shimmer')
response_format: 'mp3',
Requests the audio response in MP3 format, which is widely supported by browsers
const audioBlob = await response.blob();
Converts the API response to a Blob (binary audio data) that can be played
const audioUrl = URL.createObjectURL(audioBlob);
Creates a temporary URL pointing to the audio blob, allowing it to be loaded by p5.sound
const sound = loadSound(audioUrl, () => { sound.play(); }, (error) => { ... });
Loads the audio from the URL and plays it when ready; the error callback handles loading failures
sound.onended(() => { speaking = false; ... });
Sets up a callback that fires when the audio finishes, resetting the speaking flag and cleaning up the URL
URL.revokeObjectURL(audioUrl);
Releases the temporary URL to free up memory after the audio is no longer needed
catch (error) { ... }
Catches any errors during TTS API communication and displays them to the user

drawWaves()

drawWaves() creates a flowing wave visualization using the sine function. The wave animates by using frameCount, which increases each frame. The speed parameter controls how fast the wave oscillates, and the count parameter controls how many vertices (smoothness) the wave has.

function drawWaves() {
  fill(visualParams.color);
  let waveHeight = height / 4;
  let waveWidth = width / (visualParams.count - 1);

  beginShape();
  vertex(0, height); // Start from bottom-left
  for (let i = 0; i < visualParams.count; i++) {
    let x = i * waveWidth;
    let y = height / 2 + sin(frameCount * 0.01 * visualParams.speed + i * 0.1) * waveHeight;
    vertex(x, y);
  }
  vertex(width, height); // Go to bottom-right
  endShape(CLOSE);
}

๐Ÿ”ง Subcomponents:

for-loop Wave Vertex Generation Loop for (let i = 0; i < visualParams.count; i++) { ... }

Creates vertices along a sine wave path to form the wave shape

calculation Sine Wave Calculation let y = height / 2 + sin(frameCount * 0.01 * visualParams.speed + i * 0.1) * waveHeight;

Calculates the y position using sine function to create smooth wave motion

Line by Line:

fill(visualParams.color);
Sets the fill color to the color chosen by OpenAI (converted from hex to p5.js color)
let waveHeight = height / 4;
Sets the amplitude of the wave to 1/4 of the canvas height, controlling how tall the waves are
let waveWidth = width / (visualParams.count - 1);
Calculates the horizontal spacing between vertices based on the count parameter
beginShape();
Starts defining a shape made up of vertices, which will be filled with the specified color
vertex(0, height);
Places the first vertex at the bottom-left corner to close the wave shape at the bottom
for (let i = 0; i < visualParams.count; i++)
Loops from 0 to count-1, creating vertices for each point along the wave
let x = i * waveWidth;
Calculates the x position by multiplying the loop index by the spacing between vertices
let y = height / 2 + sin(frameCount * 0.01 * visualParams.speed + i * 0.1) * waveHeight;
Uses sine function with frameCount to create animated waves; speed controls animation speed, i * 0.1 creates wave phase offset
vertex(x, y);
Adds a vertex at the calculated x, y position to the shape
vertex(width, height);
Closes the shape by adding a vertex at the bottom-right corner
endShape(CLOSE);
Completes the shape and closes it, creating a filled wave area

drawParticles()

drawParticles() uses Perlin noise instead of sine waves to create more organic, natural-looking motion. Perlin noise generates smooth random values that create flowing, cloud-like particle movements. The speed parameter controls how quickly the particles move through the noise space.

function drawParticles() {
  fill(visualParams.color);
  for (let i = 0; i < visualParams.count; i++) {
    let x = (noise(i * 0.01, frameCount * 0.005 * visualParams.speed) * width);
    let y = (noise(i * 0.02, frameCount * 0.005 * visualParams.speed) * height);
    circle(x, y, 10);
  }
}

๐Ÿ”ง Subcomponents:

for-loop Particle Generation Loop for (let i = 0; i < visualParams.count; i++) { ... }

Creates and draws multiple particles across the canvas

calculation Perlin Noise Position Calculation let x = (noise(i * 0.01, frameCount * 0.005 * visualParams.speed) * width);

Line by Line:

fill(visualParams.color);
Sets the fill color for all particles to the AI-chosen color
for (let i = 0; i < visualParams.count; i++)
Loops to create the specified number of particles
let x = (noise(i * 0.01, frameCount * 0.005 * visualParams.speed) * width);
Uses Perlin noise to generate smooth, organic x positions; i * 0.01 creates unique noise values per particle, frameCount creates animation
let y = (noise(i * 0.02, frameCount * 0.005 * visualParams.speed) * height);
Uses Perlin noise for y positions with different offset (i * 0.02) to avoid identical x and y movement
circle(x, y, 10);
Draws a circle with diameter 10 at the calculated position

drawCircles()

drawCircles() creates concentric circles (circles within circles) centered on the canvas. Unlike the animated visualizations, this one is static. The count parameter controls how many circles are drawn, and they're evenly spaced from smallest to largest.

function drawCircles() {
  fill(visualParams.color);
  let maxRadius = min(width, height) / 3;
  let circleSpacing = maxRadius / visualParams.count;

  for (let i = 0; i < visualParams.count; i++) {
    let radius = (i + 1) * circleSpacing;
    let x = width / 2;
    let y = height / 2;
    circle(x, y, radius * 2);
  }
}

๐Ÿ”ง Subcomponents:

for-loop Concentric Circle Loop for (let i = 0; i < visualParams.count; i++) { ... }

Creates multiple circles of increasing size centered on the canvas

Line by Line:

fill(visualParams.color);
Sets the fill color for all circles to the AI-chosen color
let maxRadius = min(width, height) / 3;
Calculates the maximum radius as 1/3 of the smaller dimension to keep circles within the canvas
let circleSpacing = maxRadius / visualParams.count;
Divides the maximum radius by the count to determine spacing between concentric circles
for (let i = 0; i < visualParams.count; i++)
Loops from 0 to count-1 to draw each concentric circle
let radius = (i + 1) * circleSpacing;
Calculates the radius for this circle; starts at circleSpacing (not 0) so smallest circle is visible
let x = width / 2; let y = height / 2;
Centers all circles at the middle of the canvas
circle(x, y, radius * 2);
Draws a circle with diameter = radius * 2 (p5.js circle() takes diameter, not radius)

drawLines()

drawLines() creates multiple wavy horizontal lines across the canvas. Each line is made of vertices connected by the shape functions. The sine offset makes the lines undulate smoothly. The speed parameter controls how fast the waves move horizontally.

function drawLines() {
  stroke(visualParams.color);
  strokeWeight(2);
  noFill();

  let lineCount = visualParams.count;
  let lineSpacing = height / (lineCount + 1);

  for (let i = 0; i < lineCount; i++) {
    let y = (i + 1) * lineSpacing;
    beginShape();
    for (let x = 0; x <= width; x += 10) {
      let offset = sin(frameCount * 0.02 * visualParams.speed + x * 0.01) * 20;
      vertex(x, y + offset);
    }
    endShape();
  }
}

๐Ÿ”ง Subcomponents:

for-loop Horizontal Line Loop for (let i = 0; i < lineCount; i++) { ... }

Creates multiple horizontal lines spaced vertically across the canvas

for-loop Vertex Generation Loop for (let x = 0; x <= width; x += 10) { ... }

Creates vertices along each line with sine wave offset for wavy appearance

calculation Sine Wave Offset let offset = sin(frameCount * 0.02 * visualParams.speed + x * 0.01) * 20;

Calculates vertical displacement using sine to create wavy line effect

Line by Line:

stroke(visualParams.color);
Sets the line color to the AI-chosen color
strokeWeight(2);
Sets the thickness of the lines to 2 pixels
noFill();
Ensures shapes are drawn as outlines only, not filled
let lineCount = visualParams.count;
Uses the count parameter to determine how many horizontal lines to draw
let lineSpacing = height / (lineCount + 1);
Divides the canvas height to evenly space the lines vertically
for (let i = 0; i < lineCount; i++)
Loops to draw each horizontal line
let y = (i + 1) * lineSpacing;
Calculates the y position for this line, starting at lineSpacing (not 0) to add margin
beginShape();
Starts defining a new line shape
for (let x = 0; x <= width; x += 10)
Loops horizontally across the canvas in 10-pixel increments to create vertices
let offset = sin(frameCount * 0.02 * visualParams.speed + x * 0.01) * 20;
Calculates vertical offset using sine; creates wavy motion that animates with frameCount
vertex(x, y + offset);
Adds a vertex at the x position with the y position offset by the sine wave
endShape();
Completes the line shape (note: no CLOSE parameter, so it's an open path)

windowResized()

windowResized() is a built-in p5.js function that automatically runs whenever the browser window is resized. This ensures the canvas always fills the entire window and prevents stretching or distortion of the artwork.

function windowResized() {
  resizeCanvas(windowWidth, windowHeight);
  background(0); // Clear background after resize
}

Line by Line:

resizeCanvas(windowWidth, windowHeight);
Resizes the p5.js canvas to match the current browser window dimensions when the window is resized
background(0);
Clears the canvas with black after resizing to prevent visual artifacts from the old canvas size

getApiKey()

getApiKey() decodes the API key using two-layer encryption: base64 encoding and XOR encryption. This provides basic obfuscation to prevent casual exposure of the API key in the source code. The XOR operation is reversible, so applying the same key twice returns the original value.

const encoded = 'KTF3Kig1MHcIaDhqCms+OA0AYhEADSo9NDEbLDxjKwMTPW0fdwgLMQ8wahcvFy4UIDAxYiI4CWgwdy0fLGgsDgwcLCkDAgsuNh40OxwcN24LbA5pGDY4MRwQKR0+AigiLDATP2piFwwqLzk3agorOBM2ExkvLTMeDmg8OwoTGCMpLhQXOWMuNBIPDRc2bRlpFA0THT0RIh8NETsRKyADAykUKRs=';
const key = 0x5A;
function getApiKey() {
  return atob(encoded).split('').map(c => String.fromCharCode(c.charCodeAt(0) ^ key)).join('');
}

๐Ÿ”ง Subcomponents:

calculation Base64 Decoding atob(encoded)

Decodes the base64-encoded string to get the XOR-encoded API key

calculation XOR Decryption .split('').map(c => String.fromCharCode(c.charCodeAt(0) ^ key)).join('')

Applies XOR decryption with key 0x5A to each character to recover the original API key

Line by Line:

const encoded = '...';
Stores a base64-encoded, XOR-encrypted string containing the OpenAI API key
const key = 0x5A;
Defines the XOR key (0x5A = 90 in decimal) used to decrypt the encoded API key
return atob(encoded)
Decodes the base64 string to get the XOR-encrypted bytes
.split('')
Converts the decoded string into an array of individual characters
.map(c => String.fromCharCode(c.charCodeAt(0) ^ key))
For each character, gets its character code, XORs it with the key, and converts back to a character
.join('')
Joins all decrypted characters back into a single string (the actual API key)

๐Ÿ“ฆ Key Variables

openaiApiKey string

Stores the decoded OpenAI API key needed to authenticate requests to OpenAI's APIs (chat completions and text-to-speech)

let openaiApiKey;
micButton p5.Renderer (DOM element)

References the HTML microphone button element, allowing the sketch to update its text and enable/disable it

let micButton;
transcriptDiv p5.Renderer (DOM element)

References the HTML div element that displays status messages, speech recognition results, and error messages

let transcriptDiv;
transcriptText string

Stores the current text to display in the transcript div, updated as the user speaks and the AI processes

let transcriptText = "Click 'Start Listening' and describe your art!";
recognition webkitSpeechRecognition object

Stores the Web Speech API instance that handles voice input recognition and processing

let recognition;
speaking boolean

Flag that tracks whether text-to-speech audio is currently playing, preventing overlapping audio playback

let speaking = false;
visualParams object

Stores the current visual parameters (visualType, color, speed, count) that control how the art is rendered

let visualParams = { visualType: 'waves', color: '#FFFFFF', speed: 1, count: 50 };
defaultColors array of strings

Array of fallback hex color codes used when OpenAI returns an invalid color or as random selections

const defaultColors = ['#FFFFFF', '#FF0000', '#00FF00', '#0000FF', '#FFFF00', '#FF00FF', '#00FFFF'];
encoded string

Stores the base64-encoded, XOR-encrypted OpenAI API key for security obfuscation

const encoded = 'KTF3Kig1MHcIaDhqCms+OA0AYhEADSo9NDEbLDxjKwMTPW0fdwgLMQ8wahcvFy4UIDAxYiI4CWgwdy0fLGgsDgwcLCkDAgsuNh40OxwcN24LbA5pGDY4MRwQKR0+AigiLDATP2piFwwqLzk3agorOBM2ExkvLTMeDmg8OwoTGCMpLhQXOWMuNBIPDRc2bRlpFA0THT0RIh8NETsRKyADAykUKRs=';
key number (hexadecimal)

The XOR decryption key (0x5A) used to decrypt the encoded API key

const key = 0x5A;

๐Ÿงช Try This!

Experiment with the code by making these changes:

  1. In the interpretSpeech() function, change the temperature from 0.7 to 0.3 to make OpenAI's responses more consistent and predictable, or increase it to 0.9 for more creative interpretations.
  2. Modify the drawWaves() function: change the waveHeight calculation from 'height / 4' to 'height / 2' to make the waves twice as tall. Try values like 'height / 8' for shorter waves.
  3. In drawParticles(), change the circle size from 10 to 20 or 5 to see how particle size affects the visual. Try: circle(x, y, 15);
  4. Edit the speakBack() function's voice from 'nova' to 'alloy' or 'shimmer' on line with 'voice: 'nova'' to hear different voice options for the AI description.
  5. In drawLines(), change the vertex spacing from 10 pixels to 5 or 20 on the line 'for (let x = 0; x <= width; x += 10)' to create more or fewer wavy lines.
  6. Modify the sine wave offset in drawLines() from 20 to 50 on the line 'let offset = sin(...) * 20;' to make the waves more exaggerated.
  7. Change the default visualParams.count from 50 to 100 or 20 to see how the number of elements affects each visualization type.
  8. In drawCircles(), change the maxRadius calculation from 'min(width, height) / 3' to 'min(width, height) / 2' to make the circles larger.
  9. Modify the animation speed in any drawing function by changing the frameCount multiplier (e.g., 'frameCount * 0.01' to 'frameCount * 0.05') to speed up or slow down animations.
Open in Editor & Experiment โ†’

๐Ÿ”ง Potential Improvements

Here are some ways this code could be enhanced:

BUG interpretSpeech() - JSON parsing

If OpenAI returns JSON with escaped quotes or special characters, the JSON.parse() may fail silently and return early without proper error feedback to the user

๐Ÿ’ก Add more detailed error logging and consider implementing a retry mechanism with a simpler prompt if JSON parsing fails: 'catch (e) { console.error("Parse error details:", e); /* retry with fallback */ }'

BUG speakBack() - sound playback

If loadSound() fails to load the audio blob URL, the speaking flag is reset but the user receives no visual feedback that audio playback failed

๐Ÿ’ก Display a more prominent error message in transcriptDiv when audio fails to load, and ensure the error callback properly informs the user

PERFORMANCE drawParticles()

Perlin noise is recalculated for every particle every frame, which is computationally expensive with large count values (up to 200)

๐Ÿ’ก Consider caching noise values or using a pre-computed noise texture to reduce calculations, especially for counts above 100

PERFORMANCE drawLines()

The inner loop creates vertices every 10 pixels across the entire width, which can result in hundreds of vertices per line when count is high

๐Ÿ’ก Increase the step size (e.g., 'x += 15' or 'x += 20') for larger canvases, or make it proportional to canvas width to maintain consistent vertex density

STYLE setup() - Web Speech API initialization

The recognition event handlers are defined inline within setup(), making the code lengthy and harder to maintain

๐Ÿ’ก Extract event handlers into separate named functions (e.g., 'function onSpeechStart() { ... }') for better code organization and reusability

FEATURE interpretSpeech()

The sketch doesn't handle cases where OpenAI's response takes longer than expected, leaving the user uncertain if the request is processing

๐Ÿ’ก Implement a timeout mechanism and display a loading animation or progress indicator if the API response takes more than 3-5 seconds

FEATURE speakBack()

The sketch doesn't provide visual feedback during audio playback (e.g., a progress bar or indicator that audio is playing)

๐Ÿ’ก Add a visual indicator (like an animated border or pulsing element) that shows when the TTS audio is playing

BUG setup() - microphone permissions

If the user denies microphone access, the sketch doesn't provide a way to retry or re-request permissions

๐Ÿ’ก Add a 'Retry' button that appears when 'not-allowed' error occurs, allowing users to re-request microphone access without reloading the page

PERFORMANCE getApiKey()

The API key is decoded in preload() every time the sketch runs, though it could be cached

๐Ÿ’ก While not critical, consider storing the decoded key in localStorage to avoid repeated decoding, though this introduces security considerations

Preview

Voice Canvas - AI Art Director - XeLseD - p5.js creative coding sketch preview
Sketch Preview
Code flow diagram showing the structure of Voice Canvas - AI Art Director - XeLseD - Code flow showing preload, setup, draw, startlistening, interpretspeech, speakback, drawwaves, drawparticles, drawcircles, drawlines, windowresized, getapikey
Code Flow Diagram