SesameAILabs - CSM
GitHub - SesameAILabs/csm: A Conversational Speech Generation Model
GitHub - SesameAILabs/csm: A Conversational Speech Generation Model
A Conversational Speech Generation Model. Contribute to SesameAILabs/csm development by creating an account on GitHub.
github.com
github에 있는 코드로 진행 할 경우 PyTorch의 cpu 버전이 설치 된다.
(csm) D:\workspace\csm>pip install torchtriton
ERROR: Could not find a version that satisfies the requirement torchtriton (from versions: none)
ERROR: No matching distribution found for torchtriton
(csm) D:\workspace\csm>nvidia-smi
Sun Mar 16 21:57:35 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 566.36 Driver Version: 566.36 CUDA Version: 12.7 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Driver-Model | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti WDDM | 00000000:01:00.0 On | N/A |
| N/A 60C P0 23W / 80W | 1624MiB / 6144MiB | 2% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
(csm) D:\workspace\csm>python -c "import torch; print(torch.__version__)"
2.4.0+cpu
(csm) D:\workspace\csm>python -c "import torch; print(torch.cuda.is_available())"
False
(csm) D:\workspace\csm>pip uninstall torch torchvision torchaudio -y
Found existing installation: torch 2.4.0
Uninstalling torch-2.4.0:
Successfully uninstalled torch-2.4.0
WARNING: Skipping torchvision as it is not installed.
Found existing installation: torchaudio 2.4.0
Uninstalling torchaudio-2.4.0:
Successfully uninstalled torchaudio-2.4.0
(csm) D:\workspace\csm>pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
...
Installing collected packages: sympy, torch, torchvision, torchaudio
Attempting uninstall: sympy
Found existing installation: sympy 1.13.3
Uninstalling sympy-1.13.3:
Successfully uninstalled sympy-1.13.3
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
silentcipher 1.0.5 requires torch==2.4.0, but you have torch 2.5.1+cu121 which is incompatible.
silentcipher 1.0.5 requires torchaudio==2.4.0, but you have torchaudio 2.5.1+cu121 which is incompatible.
Successfully installed sympy-1.13.1 torch-2.5.1+cu121 torchaudio-2.5.1+cu121 torchvision-0.20.1+cu121
(csm) D:\workspace\csm>python -c "import torch; print(torch.cuda.is_available())"
True
(csm) D:\workspace\csm>pip uninstall silentcipher
Found existing installation: silentcipher 1.0.5
Uninstalling silentcipher-1.0.5:
Would remove:
c:\users\bhjo0\anaconda3\envs\csm\lib\site-packages\silentcipher-1.0.5.dist-info\*
c:\users\bhjo0\anaconda3\envs\csm\lib\site-packages\silentcipher\*
Proceed (Y/n)? y
Successfully uninstalled silentcipher-1.0.5
(csm) D:\workspace\csm>python -c "import torch; print(torch.__version__); print(torch.cuda.is_available())"
2.5.1+cu121
True
watermarking.py에 silentcipher 가 사용되는데 버전 충돌 때문에 삭제하고 소스 수정.
import argparse
import torch
import torchaudio
def cli_check_audio() -> None:
parser = argparse.ArgumentParser()
parser.add_argument("--audio_path", type=str, required=True)
args = parser.parse_args()
check_audio_from_file(args.audio_path)
def check_audio_from_file(audio_path: str) -> None:
audio_array, sample_rate = load_audio(audio_path)
print(f"Audio Loaded: {audio_path} | Sample Rate: {sample_rate} Hz")
def load_audio(audio_path: str) -> tuple[torch.Tensor, int]:
audio_array, sample_rate = torchaudio.load(audio_path)
audio_array = audio_array.mean(dim=0)
return audio_array, int(sample_rate)
if __name__ == "__main__":
cli_check_audio()
(테스트 중... )
Moshi
GitHub - kyutai-labs/moshi: Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a s
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. - kyutai-labs/moshi
github.com
'AI' 카테고리의 다른 글
Figma를 Bolt.new로 쉽게 옮기는 방법 (0) | 2025.03.16 |
---|---|
MCP: 모델 컨텍스트 프로토콜(Model Context Protocol) (0) | 2025.03.16 |
Perplexity 1년 무료 사용 방법 (0) | 2025.03.13 |
Cursor(VS Code)에서 ChatGPT와 협업하기 (0) | 2025.03.10 |
Text to SQL - vanna.ai (작성 중) (0) | 2025.03.07 |
댓글