OpenAI's CLIP model ported to JavaScript using the ONNX web runtime
Awesome project! I'm trying to use the tflite model that comes out of the conversion, but it's output doesn't look the same as the original model. After converting to tflite, I use the data provided in the [openai pytorch example](https://colab.research.google.com/github/openai/clip/blob/master/notebooks/Interacting_with_CLIP.ipynb): ``` text_model_path = 'clip-text-vit-32.tflite' # Load TFLite model and allocate tensors. text_interpreter = tf.lite.Interpreter(model_path=text_model_path) text_interpreter.allocate_tensors() # Get input and output tensors. text_input_details = text_interpreter.get_input_details() text_output_details = text_interpreter.get_output_details() text_token = clip.tokenize(['This is a page of text about segmentation']) text_input = np.array(text_token) text_interpreter.set_tensor(text_input_details[0]['index'], text_input) text_interpreter.invoke() print(text_output[0, :10]) ``` which gives output of: ``` [-0.1661 0.0545 -0.1515 0.4507 0.207 -0.2947 0.0406 -0.4087 -0.151 0.3198] ``` For comparison the tutorial notebook https://colab.research.google.com/github/openai/clip/blob/master/notebooks/Interacting_with_CLIP.ipynb Runs the same calculation: ``` text_token = clip.tokenize(['This is a page of text about segmentation']).cuda() text_feature = model.encode_text(text_token).float() print(np.array(text_feature.tolist())[0, :10]) ``` which gives output of: ``` array([-8.46557617e-02, 3.23486328e-01, 9.23461914e-02, -2.18261719e-01, 9.08203125e-02, 1.81152344e-01, -7.84397125e-04, -8.35449219e-01, 6.68945312e-01, -4.18945312e-01]) ```
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be resolved. The issue was opened by shortcipher3 and has received 5 comments.