본문 바로가기
AI

음성비서 : SesameAI의 CSM, Kyutai의 Moshi

by 조병희 2025. 3. 16.

SesameAILabs - CSM

GitHub - SesameAILabs/csm: A Conversational Speech Generation Model

 

GitHub - SesameAILabs/csm: A Conversational Speech Generation Model

A Conversational Speech Generation Model. Contribute to SesameAILabs/csm development by creating an account on GitHub.

github.com

github에 있는 코드로 진행 할 경우 PyTorch의 cpu 버전이 설치 된다.

(csm) D:\workspace\csm>pip install torchtriton
ERROR: Could not find a version that satisfies the requirement torchtriton (from versions: none)
ERROR: No matching distribution found for torchtriton

(csm) D:\workspace\csm>nvidia-smi
Sun Mar 16 21:57:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36                 Driver Version: 566.36         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| N/A   60C    P0             23W /   80W |    1624MiB /   6144MiB |      2%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

(csm) D:\workspace\csm>python -c "import torch; print(torch.__version__)"
2.4.0+cpu

(csm) D:\workspace\csm>python -c "import torch; print(torch.cuda.is_available())"
False

(csm) D:\workspace\csm>pip uninstall torch torchvision torchaudio -y
Found existing installation: torch 2.4.0
Uninstalling torch-2.4.0:
  Successfully uninstalled torch-2.4.0
WARNING: Skipping torchvision as it is not installed.
Found existing installation: torchaudio 2.4.0
Uninstalling torchaudio-2.4.0:
  Successfully uninstalled torchaudio-2.4.0

(csm) D:\workspace\csm>pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
...
Installing collected packages: sympy, torch, torchvision, torchaudio
  Attempting uninstall: sympy
    Found existing installation: sympy 1.13.3
    Uninstalling sympy-1.13.3:
      Successfully uninstalled sympy-1.13.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
silentcipher 1.0.5 requires torch==2.4.0, but you have torch 2.5.1+cu121 which is incompatible.
silentcipher 1.0.5 requires torchaudio==2.4.0, but you have torchaudio 2.5.1+cu121 which is incompatible.
Successfully installed sympy-1.13.1 torch-2.5.1+cu121 torchaudio-2.5.1+cu121 torchvision-0.20.1+cu121

(csm) D:\workspace\csm>python -c "import torch; print(torch.cuda.is_available())"
True

(csm) D:\workspace\csm>pip uninstall silentcipher
Found existing installation: silentcipher 1.0.5
Uninstalling silentcipher-1.0.5:
  Would remove:
    c:\users\bhjo0\anaconda3\envs\csm\lib\site-packages\silentcipher-1.0.5.dist-info\*
    c:\users\bhjo0\anaconda3\envs\csm\lib\site-packages\silentcipher\*
Proceed (Y/n)? y
  Successfully uninstalled silentcipher-1.0.5

(csm) D:\workspace\csm>python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
2.5.1+cu121
True


watermarking.py에 silentcipher 가 사용되는데 버전 충돌 때문에 삭제하고 소스 수정.

import argparse

import torch
import torchaudio

def cli_check_audio() -> None:
    parser = argparse.ArgumentParser()
    parser.add_argument("--audio_path", type=str, required=True)
    args = parser.parse_args()

    check_audio_from_file(args.audio_path)

def check_audio_from_file(audio_path: str) -> None:
    audio_array, sample_rate = load_audio(audio_path)
    print(f"Audio Loaded: {audio_path} | Sample Rate: {sample_rate} Hz")

def load_audio(audio_path: str) -> tuple[torch.Tensor, int]:
    audio_array, sample_rate = torchaudio.load(audio_path)
    audio_array = audio_array.mean(dim=0)
    return audio_array, int(sample_rate)

if __name__ == "__main__":
    cli_check_audio()

(테스트 중... )

 

Moshi

GitHub - kyutai-labs/moshi: Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

 

GitHub - kyutai-labs/moshi: Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a s

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. - kyutai-labs/moshi

github.com

 

댓글