Medication Inventory Application

In my role at Main Street Family Medicine, I created an application for our nurses to keep better track of our medications stored in-house. The app also features an option to predict which medicaions would be the best to re-order at any given time. This is based on our current inventory, dispense history, and other data concerning our current patient base. The prediction feature saves our nurses time it would otherwise take them to manually determine which medications they need to order.

Below includes a few screenshots of the app pages and the code I used to create the prediction model.

When a user navigates to the web application, they are first prompted to enter the password (as seen below):

With the correct password, the app automatically navigates to the second page, titled “Inventory Table.” The table reflects our current medication inventory along with how many times each med has been dispensed in practice history. Dispense history is an important number to have handy, as it influences our nurses’ decisions on when or if they should reorder a certain medication. The table is searchable (see right-side search bar) and can be filtered by the number of medications in stock and/or the number of times each med has been dispense (see arrows next to table column names). These functions improve the usability of the app and make it more efficient for nurses to filter information when making ordering decisions.

Finally, the last page (titled “Prediction Page”) lists the medications my model predicted we will need to order. This was based on current inventory, dispense history, previous order history, and various current patient demographic data. I included the code I used to create my model below. Also included on this page (though not in the screenshot for visability purposes) are lists of which frequently dispensed meds we are completely (and almost) out of.

The below code showcases my model for determining which meds we need to order and a summary of this process. This model was made in Python and the app in R (app code not shown).

Preprocess data (OneHotEncoder and StandardScaler for categorical and numeric data, respectively)
Test/train split
Create an ensemble made of various regression models and determine which had the lowest negative mean squared error
Hyperparameterized best model (found from previous step)
Cross validate
Test
Predict

    # 1. Preprocess

    #standard scale all except NDC

    #set up for categorical
    encoder = OneHotEncoder(handle_unknown='ignore') #was using ordinalencoder but problem was that there were NDCs in test that were not in training, giving me best value of nan in my model. changed to onehotencoder to prevent this
    categorical_cols = ["NDC"]

    #set up for numerical
    std_scaler = StandardScaler()
    all_cols = df_final.columns.tolist()
    numeric_cols = [col for col in all_cols if col not in categorical_cols]
    numeric_cols.remove('total_ordered') #remove the final output

    #now use a column transformer to do both pre processors on their separate columns
    transformer = ColumnTransformer(
            transformers=[
                    ('categories', encoder, categorical_cols),
                    ('numeric', std_scaler, numeric_cols)
            ],
            remainder='drop', verbose_feature_names_out=False)

    #####################

    # 2. Split

    X = df_final.loc[:, df_final.columns != 'total_ordered']
    y = df_final['total_ordered']

    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

    #####################

    # 3. Ensemble

    #four models that might work well
    clf1 = RandomForestRegressor(random_state=0)
    clf2 = GradientBoostingRegressor()
    clf3 = BaggingRegressor()
    clf4 = SGDRegressor()

    #ensemble
    eclf = VotingRegressor(
        estimators=[('rf', clf1), ('boost', clf2), ('bag', clf3), ('sgd',clf4)])

    #find accuracies using cross validated scores
    for clf, label in zip([clf1, clf2, clf3, clf4, eclf], ['Random Forest', 'Gradient Boost', 'Bagging', 'SGD', 'Ensemble']):
        scores = cross_val_score(clf, X_train_trans, y_train, scoring='neg_mean_squared_error', cv=5) 
        print("MSE: %0.2f (+/- %0.2f) [%s]" % (scores.mean(), scores.std(), label)) #gradient boost was the best

    ##################### 

    # 4. Hyperparameterize

    #pipeline
    pipeline = Pipeline([('transformer', transformer),
                     ('GradBoost', GradientBoostingRegressor())])

    #find names of parameters
    pipeline.get_params()

    #don't need to hyperparameterize the transformer
    parameters = {
        'GradBoost__learning_rate': [0.01, 0.1, 0.2, 0.3],
        'GradBoost__n_estimators': randint(1,1000),
        'GradBoost__max_depth': randint(1,10),
        'GradBoost__min_impurity_decrease': [0.0, 0.1, 0.2, 0.3],
    }

    n_iter_search = 5

    random_search = RandomizedSearchCV(pipeline, param_distributions=parameters, n_iter=n_iter_search, n_jobs=-1, verbose=True)

    #find best score
    try:
        random_search.fit(X_train, y_train)
        # Find the best score
        best_score = random_search.best_score_
        print("Best score: %0.2f" % best_score)
    except Exception as e:
        print(f"An error occurred: {e}")

    #Best negative MSE = 0.47- very good! (since y_train values range from 0 to 1400, see: y_train_counts = y_train.value_counts())

    #####################

    # 5. Cross validate

    scores = cross_val_score(random_search, X_train, y_train, cv=5)
    print(scores.mean(), '\t', scores.std())
    # 0.3921064495878645 	 0.08086083233321788 - really happy with this

    #find parameters used
    random_search.best_estimator_
    #learning rate = 0.01, max_depth = 2, n_estimators = 246

    #####################

    # 6. Test 

    #set up pipeline with above parameters
    pipeline_test = Pipeline([('transformer', transformer),
                        ('GradBoost', GradientBoostingRegressor(learning_rate=0.01, max_depth=2, n_estimators=246))])

    #fit and predict
    pipeline_test.fit(X_train, y_train)
    y_pred = pipeline_test.predict(X_test)

    #find neg mse
    mse = mean_squared_error(y_test,y_pred)
    neg_mse = -mse

    #####################

    # 7. Predict

    y_pred_current = pipeline_test.predict(current_final)

    #convert to df
    predictions_df = pd.DataFrame(y_pred_current, columns=['Predicted_total_ordered'])

    #match NDC and name to index
    pred_counts = predictions_df['Predicted_total_ordered'].value_counts()
    #print(pred_counts)

    predictions_df_filter = predictions_df[predictions_df['Predicted_total_ordered']>1]
    print(predictions_df_filter)

    #use index to match
    matching_rows = current_final.loc[predictions_df_filter.index]

    #merge name with ndc
    pred_names = pd.merge(names,matching_rows,on=["NDC"])

    #keep only necessary names
    cols_to_keep6 = ["Name","NDC"]
    pred_names = pred_names[cols_to_keep6]

    #drop duplicated NDC
    pred_names = pred_names.drop_duplicates(subset=['NDC'])
    pred_names = pred_names.drop_duplicates(subset=['Name'])

    #drop NDC then export to csv for RStudio
    pred_names = pred_names.drop(columns=["NDC"])
    pred_names.to_csv("./prediction_names.csv")

NOTE: Thank you to the University of California for the stock medication image used for this project.