Study/Python

Open AI Whisper 사용해 보기

Bluesky_ 2023. 5. 31. 03:13

Whisper 소개

Whisper는 Open AI에서 공개한 인공지능 모델로 음성을 분석해 텍스트로 변환할 수 있다.

https://openai.com/

https://github.com/openai/whisper

로컬에서 python으로 실행해 볼 수도 있고 whisper를 좀 더 편하게 쓸 수 있는 오픈 소스들이 github에 계속 만들어지고 있는 듯하다.

Whisper 설치

사전 준비

다음의 사전 준비를 하면 된다.

Python 설치
GPU를 사용할 수 있도록 CUDA Toolkit 설치
CUDA에 cuDNN(CUDA Deep Nural Network Library)을 추가

CUDA와 cuDNN 설치를 하지 않아도 whisper를 사용할 수 있다.

CUDA Toolkit 설치

CUDA Toolkit은 아래에서 다운로드할 수 있다.

https://developer.nvidia.com/cuda-toolkit

이 글 작성 기준 CUDA Toolkit 12.1 Update 1이 가장 최신 버전이다.

내 경우 Windows x86_64 11 exe (local)을 선택하여 설치하였다.

cuDNN 설치

cuDNN은 아래에서 다운로드할 수 있다.

https://developer.nvidia.com/rdp/cudnn-archive

다운로드하여 압축을 풀면 bin, include lib 폴더가 있는데 이 폴더를 CUDA Toolkit 설치 위치에 덮어쓰면 된다.

내 경우 CUDA Toolkit 위치는 아래와 같다.

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1

CUDA 가 설치가 잘되었다면 `nvcc --version` 명령어로 확인할 수 있다.

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:36:15_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.1, V12.1.105
Build cuda_12.1.r12.1/compiler.32688072_0

Whisper 프로젝트 설정하기

사전 준비를 끝냈으면 이제 python 프로젝트를 만들어 whiper를 설정하면 된다.

whisper의 경우 다음과 같은 요구 조건이 있다.

Python 3.8 - 3.11
PyTorch 1.10.x 이상 최신 버전
(optional) cuda 연동 GPU 사용

Python은 python 홈페이지에서 다운로드하여 설치하면 된다.

https://www.python.org/

기본적인 가이드엔 pip를 사용하였지만 버전관리가 불편하므로 pipenv로 진행한다.

pipenv의 사용은 이전에 소개한 적이 있다.

2020.08.25 - [Study/Python] - Python 개발 환경 구성하기, pipenv 사용하기

pipenv를 설치하고

pip install pipenv

환경을 사용할 프로젝트를 적당한 위치에 만들면 해당 위치에 Pipfile이 생성된다.

mkdir whisper
cd whisper
pipenv install

pipenv shell로 진입한다.

pipenv shell

진입 이후에는 대략 다음처럼 현재 활성화된 환경이 표시되고 이후 설치는 모두 이 환경에서 관리된다.

C:\Users\bluesky\git\blueskyPythonStudy\whisper>pipenv shell
Launching subshell in virtual environment...
Microsoft Windows [Version 10.0.22621.1702]
(c) Microsoft Corporation. All rights reserved.

(whisper-DY7r8E99) C:\Users\bluesky\git\blueskyPythonStudy\whisper>

따라서 이후 다시 이 환경에 들어오고 싶은 경우 해당 환경이 설정된 Pipfile이 있는 위치에서 다시 pipenv shell 명령을 사용하면 된다.

pipenv install openai-whisper
# 또는
pipenv install -e git+https://github.com/openai/whisper.git#egg=openai-whisper

위 명령을 실행하고 나면 설치가 완료되었다.

어떤 것들이 설치되었는지는 `pipenv graph` 명령어를 사용하면 확인할 수 있다.

(whisper-DY7r8E99) C:\Users\bluesky\git\blueskyPythonStudy\whisper>pipenv graph
cuda-python==12.1.0
  - cython [required: Any, installed: 0.29.35]
openai-whisper==20230314
  - ffmpeg-python [required: ==0.2.0, installed: 0.2.0]
    - future [required: Any, installed: 0.18.3]
  - more-itertools [required: Any, installed: 9.1.0]
  - numba [required: Any, installed: 0.57.0]
    - llvmlite [required: >=0.40.0dev0,<0.41, installed: 0.40.0]
    - numpy [required: >=1.21,<1.25, installed: 1.24.3]
  - numpy [required: Any, installed: 1.24.3]
  - tiktoken [required: ==0.3.1, installed: 0.3.1]
    - regex [required: >=2022.1.18, installed: 2023.5.5]
    - requests [required: >=2.26.0, installed: 2.31.0]
      - certifi [required: >=2017.4.17, installed: 2023.5.7]
      - charset-normalizer [required: >=2,<4, installed: 3.1.0]
      - idna [required: >=2.5,<4, installed: 3.4]
      - urllib3 [required: >=1.21.1,<3, installed: 2.0.2]
  - torch [required: Any, installed: 2.0.1]
    - filelock [required: Any, installed: 3.12.0]
    - jinja2 [required: Any, installed: 3.1.2]
      - MarkupSafe [required: >=2.0, installed: 2.1.2]
    - networkx [required: Any, installed: 3.1]
    - sympy [required: Any, installed: 1.12]
      - mpmath [required: >=0.19, installed: 1.3.0]
    - typing-extensions [required: Any, installed: 4.6.2]
  - tqdm [required: Any, installed: 4.65.0]
    - colorama [required: Any, installed: 0.4.6]

Whisper 사용해 보기

간단하게 테스트할 음성파일 (영화 같은 mp4나 음성이 있는 mp3 파일, wav 파일 등...)을 하나 가져와 테스트를 해본다.

내 경우 mp3 파일을 사용해 보았다.

python 코드로 작성할 수도 있지만 command line 명령어로도 사용이 가능하다.

명령어는 간단하다.

whisper [대상] --language [분석할언어]

위와 같이 사용하면 가장 단순하게 사용할 수 있다.

(whisper-DY7r8E99) C:\Users\bluesky\git\blueskyPythonStudy\whisper>whisper sample.mp3 --language Korean
C:\Users\bluesky\.virtualenvs\whisper-DY7r8E99\Lib\site-packages\whisper\timing.py:57: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  @numba.jit
C:\Users\bluesky\.virtualenvs\whisper-DY7r8E99\Lib\site-packages\whisper\transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
[00:00.000 --> 00:19.000]  월요일에는 아마 맡으지 않을까
[00:19.000 --> 00:25.000]  화요일도 성급할 보이지 않그래
[00:25.000 --> 00:32.000]  수요일은 뭔가 정정한 느낌 목요일은
[00:32.000 --> 00:36.000]  그냥 내가 왠지 싫어
[00:36.000 --> 00:42.000]  오 이번 주 금요일
[00:42.000 --> 00:47.000]  오 금요일의 시간 어때요
[00:47.000 --> 00:51.000]  주말까지 기다리긴 힘들어
[00:51.000 --> 00:59.000]  시간은 달려라 시계로 못해 고 싶지만
[00:59.000 --> 01:01.000]  마인 컨트롤
[01:01.000 --> 01:06.000]  잊혀가 달콤해 이 남자 도대체 뭐야
[01:06.000 --> 01:12.000]  사랑에 빠지지 않고 못 빼기겠어
[01:12.000 --> 01:18.000]  언젠가 내 맘은 저기 시갯바로 위에 올라타
[01:18.000 --> 01:24.000]  한간씩 그대에게 더 가까이
[01:48.000 --> 01:54.000]  오 이번 주 금요일
[01:54.000 --> 02:00.000]  오 금요일의 시간 어때요
[02:00.000 --> 02:06.000]  딱히 먹고 싶은 영화는 없지만
[02:06.000 --> 02:12.000]  딱히 먹고 싶은 메뉴는 없지만
[02:12.000 --> 02:15.000]  주말까지 기다리긴 힘들어
[02:15.000 --> 02:23.000]  시간은 달려라 시계로 못해 고 싶지만
[02:23.000 --> 02:24.000]  마인 컨트롤
[02:24.000 --> 02:30.000]  잊혀가 달콤해 이 남자 도대체 뭐야
[02:30.000 --> 02:36.000]  사랑에 빠지지 않고 못 빼기겠어
[02:36.000 --> 02:42.000]  언젠가 내 맘은 저기 시갯바로 위에 올라타
[02:42.000 --> 02:48.000]  한간씩 그대에게 더 가까이
[02:48.000 --> 02:52.000]  너만 게 어울린 것 같아
[02:52.000 --> 02:54.000]  이 여자 도대체 뭐야
[02:54.000 --> 03:01.000]  사랑해 주지 않고 못 빼기겠어
[03:01.000 --> 03:06.000]  돌아오는 이번 주 금요일에 만나요
[03:06.000 --> 03:13.000]  그 날을 나만 얻어 갖춰 갖춰요
[03:13.000 --> 03:20.000]  나 더 가까이
[03:20.000 --> 03:25.000]  나만 더 가까이
[03:36.000 --> 03:41.000]  나 더 가까이

뭔가 엉성한 부분이 있긴 해도 신기하게 잘 분석해 준 것 같다.

수행 시 deprecation warning이 보이는 부분은 버전을 올라가면 해결이 될 것 같고 두 번째 warning을 보면 `FP16 is not supported on CPU; using FP32 instead`라는 안내가 보인다.

GPU관련 설정을 하지 않아서 CPU로 음성 인식을 한다는 내용인데 돌리면서 작업관리자를 확인해 보면 CPU 리소스를 사용하고 있는 것을 확인할 수 있다.

내 경우 해당 부분은 현재 기준으로 설치된 torch 2.0.1을 개발 버전인 2.1.0.devXXX 버전으로 변경하니 해결이 되었다.

https://pytorch.org/get-started/pytorch-2.0/#requirements

pipenv install torch --index https://download.pytorch.org/whl/nightly/cu118

설치 후 변경된 버전은 다음과 같다.

(whisper-DY7r8E99) C:\Users\bluesky\git\blueskyPythonStudy\whisper>pipenv graph
cuda-python==12.1.0
  - cython [required: Any, installed: 0.29.35]
openai-whisper==20230314
  - ffmpeg-python [required: ==0.2.0, installed: 0.2.0]
    - future [required: Any, installed: 0.18.3]
  - more-itertools [required: Any, installed: 9.1.0]
  - numba [required: Any, installed: 0.57.0]
    - llvmlite [required: >=0.40.0dev0,<0.41, installed: 0.40.0]
    - numpy [required: >=1.21,<1.25, installed: 1.24.3]
  - numpy [required: Any, installed: 1.24.3]
  - tiktoken [required: ==0.3.1, installed: 0.3.1]
    - regex [required: >=2022.1.18, installed: 2023.5.5]
    - requests [required: >=2.26.0, installed: 2.31.0]
      - certifi [required: >=2017.4.17, installed: 2023.5.7]
      - charset-normalizer [required: >=2,<4, installed: 3.1.0]
      - idna [required: >=2.5,<4, installed: 3.4]
      - urllib3 [required: >=1.21.1,<3, installed: 2.0.2]
  - torch [required: Any, installed: 2.1.0.dev20230530+cu118]
    - filelock [required: Any, installed: 3.12.0]
    - fsspec [required: Any, installed: 2023.5.0]
    - jinja2 [required: Any, installed: 3.1.2]
      - MarkupSafe [required: >=2.0, installed: 2.1.2]
    - networkx [required: Any, installed: 3.1]
    - sympy [required: Any, installed: 1.12]
      - mpmath [required: >=0.19, installed: 1.3.0]
    - typing-extensions [required: Any, installed: 4.6.2]
  - tqdm [required: Any, installed: 4.65.0]
    - colorama [required: Any, installed: 0.4.6]

torch 버전이 변경되었고 이후엔 다시 번역을 돌려보면 GPU를 잘 사용하여 속도도 훨씬 빨라진 것을 확인할 수 있었다.

command line 명령어가 어떤 것들이 있는지 확인하려면 help 옵션을 사용하면 된다.

whisper -h
# 또는
whisper --help