기타
[에러 해결] AttributeError: 'NoneType' object has no attribute 'dtype'
hello_
2024. 7. 23. 11:08
!pip install transformers
!pip install openpyxl # 엑셀 파일 읽기
# 경고메시지 끄기
import warnings
warnings.filterwarnings(action='ignore')
import numpy as np
import pandas as pd
import tensorflow as tf
import transformers
print(transformers.__version__) # 4.42.4
print(tf.__version__) # 2.15.0
# 데이터 불러오기
comment_train = pd.read_excel('https://github.com/gzone2000/TEMP_TEST/raw/master/A_comment_train.xlsx', engine='openpyxl')
comment_test = pd.read_excel('https://github.com/gzone2000/TEMP_TEST/raw/master/A_comment_test.xlsx', engine='openpyxl')
comment = pd.concat([comment_train, comment_test])
# 라벨인코딩
comment['label'] = comment['label'].replace(['긍정', '부정'], [0, 1])
# x, y 나누기
x = comment.data.to_list()
y = comment.label.to_list()
# train test split
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, stratify=y, random_state=48)
print(len(x_train), len(x_test), len(y_train), len(y_test))
# Pre-Trained된 BERT tekenizer 가져오기
from transformers import AutoConfig, BertTokenizerFast, TFBertForSequenceClassification
bert_model = 'klue/bert-base'
tokenizer = BertTokenizerFast.from_pretrained(bert_model)
# Pre-Trained 된 BERT tokenizer 사용하여 Train, Test 데이터 토큰화하기
train_encodings = tokenizer(x_train, truncation=True, padding=True)
test_encodings = tokenizer(x_test, truncation=True, padding=True)
# Train, Test 데이터셋을 Tensorflow Dataset 형태로 변환
train_dataset = tf.data.Dataset.from_tensor_slices((dict(train_encodings), y_train))
train_dataset = train_dataset.shuffle(1000).batch(16).cache().prefetch(tf.data.experimental.AUTOTUNE)
test_dataset = tf.data.Dataset.from_tensor_slices((dict(test_encodings), y_test))
test_dataset = test_dataset.batch(16).cache().prefetch(tf.data.experimental.AUTOTUNE)
# Pre-Trained 된 BERT모델 가져오고 컴파일, 학습 수행
from transformers import TFBertForSequenceClassification
model = TFBertForSequenceClassification.from_pretrained(bert_model, num_labels=2, from_pt=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss, metrics=['accuracy'])
model.fit(train_dataset, epochs=1, batch_size=16, validation_data=(test_dataset))
# 학습된 모델로 test_dataset 예측하기
y_test_pred = model.predict(test_dataset)
df = pd.DataFrame(np.argmax(y_test_pred.logits, axis=1), columns=['predict'])
df['true'] = y_test
print(np.sum(df['true'] == df['predict'])/len(df))
53번 줄에서
AttributeError: 'NoneType' object has no attribute 'dtype' 에러가 났다.
아래 링크에 나온 대로 loss를 model.compute_loss에서 model.hf_compute_loss로 바꿨더니 해결 됨
AttributeError: 'NoneType' object has no attribute 'dtype' · Issue #1 · yashinaniya/NLP_Projects
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5, epsilon=1e-08) model.compile(optimizer=optimizer, loss=model.compute_loss, metrics=['accuracy']) model.fit(train_dataset.shuffle(100).batch(...
github.com
어제는 잘 해결되지 않던 에러가 오늘은 쉽게 해결이 됐다. 분명 어제도 시도해봤던 건데 왜 안됐을까...