Huggingface Transformers を使用してディスクから事前トレーニング済みモデルをロードする質問する

2024-07-05 • tag-icon

huggingface-transformers

ドキュメントよりfrom_pretrained の場合毎回事前トレーニング済みベクトルをダウンロードする必要はなく、次の構文を使用して保存し、ディスクから読み込むことができることを理解しています。

  - a path to a `directory` containing vocabulary files required by the tokenizer, for instance saved using the :func:`~transformers.PreTrainedTokenizer.save_pretrained` method, e.g.: ``./my_model_directory/``.
  - (not applicable to all derived classes, deprecated) a path or url to a single saved vocabulary file if and only if the tokenizer only requires a single vocabulary file (e.g. Bert, XLNet), e.g.: ``./my_model_directory/vocab.txt``.

そこで、モデルハブに行きました。

https://huggingface.co/models

欲しかったモデルが見つかりました:

https://huggingface.co/bert-base-cased

私は、このリポジトリに提供されたリンクからダウンロードしました:

マスク言語モデリング (MLM) 目標を使用した英語の事前トレーニング済みモデル。この論文で紹介され、このリポジトリで初めてリリースされました。このモデルは大文字と小文字を区別します。つまり、english と English を区別します。

保存場所:

  /my/local/models/cased_L-12_H-768_A-12/

を含む：

 ./
 ../
 bert_config.json
 bert_model.ckpt.data-00000-of-00001
 bert_model.ckpt.index
 bert_model.ckpt.meta
 vocab.txt

つまり、今は次のようになっています。

  PATH = '/my/local/models/cased_L-12_H-768_A-12/'
  tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)

そして、次のエラーが発生します:

>           raise EnvironmentError(msg)
E           OSError: Can't load config for '/my/local/models/cased_L-12_H-768_A-12/'. Make sure that:
E           
E           - '/my/local/models/cased_L-12_H-768_A-12/' is a correct model identifier listed on 'https://huggingface.co/models'
E           
E           - or '/my/local/models/cased_L-12_H-768_A-12/' is the correct path to a directory containing a config.json file

同様に、config.json に直接リンクする場合も次のようになります。

  PATH = '/my/local/models/cased_L-12_H-768_A-12/bert_config.json'
  tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)

        if state_dict is None and not from_tf:
            try:
                state_dict = torch.load(resolved_archive_file, map_location="cpu")
            except Exception:
                raise OSError(
>                   "Unable to load weights from pytorch checkpoint file. "
                    "If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. "
                )
E               OSError: Unable to load weights from pytorch checkpoint file. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

huggingface にローカルの事前トレーニング済みモデルを使用させるには、何を変更すればよいでしょうか?

コメントに対応するために更新

YOURPATH = '/somewhere/on/disk/'

name = 'transfo-xl-wt103'
tokenizer = TransfoXLTokenizerFast(name)
model = TransfoXLModel.from_pretrained(name)
tokenizer.save_pretrained(YOURPATH)
model.save_pretrained(YOURPATH)

>>> Please note you will not be able to load the save vocabulary in Rust-based TransfoXLTokenizerFast as they don't share the same structure.
('/somewhere/on/disk/vocab.bin', '/somewhere/on/disk/special_tokens_map.json', '/somewhere/on/disk/added_tokens.json')

それですべては救われたが、その後は...

YOURPATH = '/somewhere/on/disk/'
TransfoXLTokenizerFast.from_pretrained('transfo-xl-wt103', cache_dir=YOURPATH, local_files_only=True)

    "Cannot find the requested files in the cached path and outgoing traffic has been"
ValueError: Cannot find the requested files in the cached path and outgoing traffic has been disabled. To enable model look-ups and downloads online, set 'local_files_only' to False.

ベストアンサー1

ファイルはモデルフォルダーを基準としてどこにありますか? 絶対パスではなく相対パスである必要があると思います。したがって、コードを記述するファイルがにある場合'my/local/'、コードは次のようになります。

PATH = 'models/cased_L-12_H-768_A-12/'
tokenizer = BertTokenizer.from_pretrained(PATH, local_files_only=True)

ファイルを直接指定するのではなく、すべてのファイルがあるフォルダーを指定するだけです。これは間違いなくの問題だと思いますPATH。「スラッシュ」のスタイルを変更してみてください: 「/」と「\」。これらはオペレーティングシステムによって異なります。また、「.」などを使用してみてください./models/cased_L-12_H-768_A-12/。

コメントに対応するために更新

ベストアンサー1

おすすめ記事