-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Trying to run in google colab.
Tensorflow version = 2.5.0
I have mounted the dataset also.
I've also tried using Tf v1, which is also not working.
I've followed the conversion to tf v2 done by https://github.com/ajenningsfrankston/graph_kt.git which also throws multiple errors.
Here are the exact errors faced.
running using:
!python /content/graph_kt/main.py --dataset /content/data/assist09_3/assist09_3
--n_hop 3
--log_dir /content/logs
--checkpoint_dir /content/checkpoint
--skill_neighbor_num 4
--question_neighbor_num 4
--hist_neighbor_num 3
--next_neighbor_num 4
--model hsei
--lr 0.001
--att_bound 0.7
--sim_emb question_emb
--dropout_keep_probs [0.8,0.8,1]
output:
2021-07-17 18:06:30.791642: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
hsei
{'data_dir': 'data', 'log_dir': '/content/logs', 'train': 1, 'hidden_neurons': [200, 100], 'lr': 0.001, 'lr_decay': 0.92, 'checkpoint_dir': '/content/checkpoint', 'dropout_keep_probs': '[0.8,0.8,1]', 'aggregator': 'sum', 'model': 'hsei', 'l2_weight': 1e-08, 'limit_max_len': 200, 'limit_min_len': 3, 'dataset': '/content/data/assist09_3/assist09_3', 'field_size': 3, 'embedding_size': 100, 'max_step': 200, 'input_trans_size': 100, 'batch_size': 32, 'select_index': [0, 1, 2], 'num_epochs': 150, 'n_hop': 3, 'skill_neighbor_num': 4, 'question_neighbor_num': 4, 'hist_neighbor_num': 3, 'next_neighbor_num': 4, 'att_bound': 0.7, 'sim_emb': 'question_emb', 'tag': 1626545192.6224916}
original test seqs num:893
167
17737
2021-07-17 18:06:40.247804: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-17 18:06:40.248722: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcuda.so.1
2021-07-17 18:06:40.277228: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.277817: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:00:04.0 name: Tesla T4 computeCapability: 7.5
coreClock: 1.59GHz coreCount: 40 deviceMemorySize: 14.75GiB deviceMemoryBandwidth: 298.08GiB/s
2021-07-17 18:06:40.277864: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-17 18:06:40.280330: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-17 18:06:40.280429: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
2021-07-17 18:06:40.282030: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcufft.so.10
2021-07-17 18:06:40.282367: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcurand.so.10
2021-07-17 18:06:40.284136: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusolver.so.10
2021-07-17 18:06:40.284943: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcusparse.so.11
2021-07-17 18:06:40.285182: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudnn.so.8
2021-07-17 18:06:40.285302: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.285894: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.286410: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-17 18:06:40.286467: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-07-17 18:06:40.773872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-17 18:06:40.773926: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264] 0
2021-07-17 18:06:40.773942: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0: N
2021-07-17 18:06:40.774118: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.774763: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.775317: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-07-17 18:06:40.775848: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13837 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
hsei
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The validate_indices argument has no effect. Indices are always validated on CPU and never validated on GPU.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/legacy_tf_layers/core.py:171: UserWarning: tf.layers.dense is deprecated and will be removed in a future version. Please use tf.keras.layers.Dense instead.
warnings.warn('tf.layers.dense is deprecated and '
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py:1692: UserWarning: layer.apply is deprecated and will be removed in a future version. Please use layer.call method instead.
warnings.warn('layer.apply is deprecated and '
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:708: UserWarning: tf.nn.rnn_cell.BasicLSTMCell is deprecated and will be removed in a future version. This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
warnings.warn("tf.nn.rnn_cell.BasicLSTMCell is deprecated and will be "
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:909: UserWarning: tf.nn.rnn_cell.LSTMCell is deprecated and will be removed in a future version. This class is equivalent as tf.keras.layers.LSTMCell, and will be replaced by that in Tensorflow 2.0.
warnings.warn("tf.nn.rnn_cell.LSTMCell is deprecated and will be "
/usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py:1700: UserWarning: layer.add_variable is deprecated and will be removed in a future version. Please use layer.add_weight method instead.
warnings.warn('layer.add_variable is deprecated and '
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/keras/layers/legacy_rnn/rnn_cell_impl.py:987: calling Zeros.init (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_8_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/GatherV2_8_grad/Reshape:0", shape=(None, None, 100), dtype=float32), dense_shape=Tensor("gradients/GatherV2_8_grad/Cast:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/GatherV2_7_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients/concat_2_grad/Slice_1:0", shape=(None, None, None), dtype=float32), dense_shape=Tensor("gradients/concat_2_grad/Shape:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients_1/GatherV2_8_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients_1/GatherV2_8_grad/Reshape:0", shape=(None, None, 100), dtype=float32), dense_shape=Tensor("gradients_1/GatherV2_8_grad/Cast:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients_1/GatherV2_7_grad/Reshape_1:0", shape=(None,), dtype=int32), values=Tensor("gradients_1/concat_2_grad/tuple/control_dependency:0", shape=(None, None, None), dtype=float32), dense_shape=Tensor("gradients_1/concat_2_grad/Shape:0", shape=(3,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
"shape. This may consume a large amount of memory." % value)
initialize complete
2021-07-17 18:07:00.410532: I tensorflow/core/platform/profile_utils/cpu_utils.cc:114] CPU Frequency: 2000189999 Hz
0% 0/150 [00:00<?, ?it/s]epoch: 0
/content/graph_kt/data_process.py:253: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
target_answers = pad_sequences(np.array([[j[-1] - feature_size for j in i[1:]] for i in seqs]), maxlen=max_step - 1,
2021-07-17 18:07:08.694893: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublas.so.11
2021-07-17 18:07:09.217026: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcublasLt.so.11
0% 0/150 [00:08<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1375, in _do_call
return fn(*args)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1360, in _run_fn
target_list, run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[{{node concat_4}}]]
[[Sum_2/_201]]
(1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[{{node concat_4}}]]
0 successful operations.
0 derived errors ignored.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/content/graph_kt/main.py", line 76, in
main()
File "/content/graph_kt/main.py", line 69, in main
train(args,train_dkt)
File "/content/graph_kt/train.py", line 50, in train
binary_pred, pred, loss = model.train(sess,features_answer_index,target_answers,seq_lens,hist_neighbor_index)
File "/content/graph_kt/model.py", line 393, in train
[self.binary_pred, self.pred, self.loss, self.train_op, self.flat_target_correctness], input_feed)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 968, in run
run_metadata_ptr)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1369, in _do_run
run_metadata)
File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/client/session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[node concat_4 (defined at content/graph_kt/model.py:157) ]]
[[Sum_2/_201]]
(1) Invalid argument: ConcatOp : Dimensions of inputs should match: shape[0] = [32,199,1,100] vs. shape[1] = [6400,199,3,100]
[[node concat_4 (defined at content/graph_kt/model.py:157) ]]
0 successful operations.
0 derived errors ignored.
Errors may have originated from an input operation.
Input Source operations connected to node concat_4:
Reshape_72 (defined at content/graph_kt/model.py:240)
Input Source operations connected to node concat_4:
Reshape_72 (defined at content/graph_kt/model.py:240)
Original stack trace for 'concat_4':
File "content/graph_kt/main.py", line 76, in
main()
File "content/graph_kt/main.py", line 69, in main
train(args,train_dkt)
File "content/graph_kt/train.py", line 18, in train
model = GIKT(args)
File "content/graph_kt/model.py", line 45, in init
self.build_model()
File "content/graph_kt/model.py", line 157, in build_model
Nh = tf.concat([tf.expand_dims(output_series, 2), self.hist_neighbors_features], 2) # [self.batch_size,max_step,M+1,feature_trans_size]]
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/util/dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/array_ops.py", line 1768, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 1228, in concat_v2
"ConcatV2", values=values, axis=axis, name=name)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 3565, in _create_op_internal
op_def=op_def)
File "usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py", line 2045, in init
self._traceback = tf_stack.extract_stack_for_node(self._c_op)
Please Help me out.